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SECRETED HUMAN PROTEINS 



This application claims the benefit of copending provisional application 
5 Serial No. 60/032,757, filed December 1 1, 1996, which is incorporated herein by 

reference. 

TECHNTC AX. ABEA O F THE INVENTION 

The invention relates to the area of proteins. More particularly, the 
10 invention relates to human secreted proteins. 



B ACKGROTJNP OF THE INVENTION 

Secreted proteins include such important proteins as growth factors, 
cytokines and their receptors, extracellular matrix proteins, and proteases. 
15 Nucleotide sequences encoding these proteins can be used to detect disease states in 

which such proteins are implicated and to develop therapeutics for such diseases. 
Thus, there is a need in the art for methods of identifying secreted proteins and the 
nucleotide sequences which encode them. 

20 SUMMARY OF THE INVENTION 

It is an object of the invention to provide an isolated and purified human 
protein. 

It is yet another object of the invention to provide a fiision protein. 



1 
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It is still another object of the invention to provide a preparation of 
antibodies. 

It is even another object of the invention to provide an isolated and purified 
subgenomic polynucleotide. 

It is yet another object of the invention to provide an isolated gene. 

It is a further object of the invention to provide a DNA construct for 
expressing all or a portion of a human protein. 

It is still another object of the invention to provide a host cell comprising a 
DNA construct. 

It is another object of the invention to provide a homologously recombinant 



It is even another object of the invention to provide a method of producing a 
human proteia 

It is another object of the invention to provide a method of identifying a 
secreted polypeptide which is modified by rough microsomes. 

These and other objects of the invention are provided by one or more of the 
embodiments described below. 

One embodiment of the invention provides an isolated and purified human 
protein. The isolated and purified human protein has an amino acid sequence 
selected firom the group consisting of the amino acid sequences shown in SEQ ID 
Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

Another embodiment of the invention provides an isolated and purified 
human protein having an amino acid sequence which is at least 85% identical to an 
amino acid sequence selected from the group consisting of the amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 
33, 34,35, 36,37, and 38. 

Still another embodiment of the invention provides a polypeptide comprising 
at least 6 contiguous amino acids of an amino acid sequence selected from the 
group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 



cell. 
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Even another embodiment of the invention provides a fusion protein. The 
fusion protein comprises a first protein segment and a second protein segment fiised 
together by means of a peptide bond. The first protein segment consists of at least 
6 contiguous amino acids selected firom the group consisting of the amino acid 

5 sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 

33, 34, 35, 36, 37, and 38. 

Yet another embodiment of the invention provides a preparation of 
antibodies. The antibodies specifically bind to a human protein having an amino 
acid sequence selected fi'om the group consisting of the amino acid sequences 

10 shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 

36, 37, and 38. 

Even another embodiment of the invention provides an isolated and purified 
subgenomic polynucleotide. The isolated and purified subgenomic polynucleotide 
has a nucleotide sequence selected fi-om the group consisting of the nucleotide 
15 sequences shown in SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 

17, 18, and 19. 

Yet another embodiment of the invention provides an isolated and purified 
subgenomic polynucleotide consisting of at least 10 contiguous nucleotides selected 
firom the group consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 

20 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

Still another embodiment of the invention provides an isolated gene. The 
isolated gene corresponds to a cDNA sequence selected fi^om the group consisting 
of the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, and 19, 

25 Another embodiment of the invention provides a DNA construct for 

expressing all or a portion of a human protein. The DNA construct comprises a 
promoter and a polynucleotide segment. The polynucleotide segment encodes at 
least 6 contiguous amino acids of a human protein having an amino acid sequence 
selected fi-om the group consisting of the amino acid sequences shown in SEQ ID 

30 Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 
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The polynucleotide segment is located downstream from the promoter. 
Transcription of the polynucleotide segment initiates at the promoter. 

Even another embodiment of the invention provides a host cell comprising a 
DNA construct. The DNA construct comprises a promoter and a polynucleotide 
segment. The polynucleotide segment encodes at least 6 contiguous anuno acids of 
a human protein having an amino acid sequence selected from the group consisting 
of the amino acid sequences shown m SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. The polynucleotide segment is 
located downstream from the promoter. Transcription of the polynucleotide 
segment initiates at the promoter. 

Still another embodiment of the invention provides a homologously 
recombinant cell having incorporated therein a new transcription initiation unit. The 
transcription initiation unit comprises in 5' to 3* order sm exogenous regulatory 
sequence, an exogenous exon, and a splice donor site. The transcription initiation 
unit is located upstream to a coding sequence of a gene. The gene comprises a 
nucleotide sequence selected from the group consisting of the nucleotide sequences 
shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
and 19. The exogenous regulatory sequence controls transcription of the coding 
sequence of the gene. 

Yet another embodiment of the invention provides a method of producing a 
human protein. A culture of a cell is grown. The cell comprises a DNA constmct. 
The DNA construct comprises a promoter and a polynucleotide segment. The 
polynucleotide segment encodes at least 6 contiguous amino acids of a human 
protein having an amino acid sequence selected from the group consisting of the 
amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 
30, 31, 32, 33, 34, 35, 36, 37, and 38. The polynucleotide segment is located 
downstream from the promoter. Transcription of the polynucleotide segment 
initiates at the promoter. The protein is purified from the culture. 

Even another embodiment of the invention provides a method of producing 
a human protein. A culture of a cell is grown. The cell comprises a new 
transcription initiation unit. The transcription initiation unit comprises in 5' to 3* 



wo 98/25959 




PCT/US97^2787 



order an exogenous regulatory sequence, an exogenous exon, and a splice donor 
site. The transcription initiation unit is located upstream to a coding sequence of a 
gene. The gene comprises a nucleotide sequence selected from the group consisting 
of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, and 19. The exogenous regulatory sequence controls 
transcription of the coding sequence of the gene. The protein is purified from the 
culture. 

Another embodiment of the invention provides a method of identifying a 
secreted polypeptide which is modified by rough microsomes. A population of 
cDNA molecules is transcribed in vitro whereby a population of cRNA molecules is 
formed. A first portion of the population of cKMA molecules is translated in vitro 
in the absence of rough microsomes whereby a first population of polypeptides is 
formed. A second portion of the population of cRNA molecules is translated in 
vitro in the presence of rough microsomes whereby a second population of 
polypeptides is formed. The first population of polypeptides is compared with the 
second population of polypeptides. Polypeptide members of the second population 
which have been modified by the rough microsomes are detected. 

The present invention thus provides the art with a method for identifying 
secreted proteins or polypeptides, the amino acid sequences of nineteen novel 
human secreted proteins, and the nucleotide sequences which encode these proteins;^:: 
The invention can be used to, inter alia, to produce secreted proteins for 
therapeutic and diagnostic purposes. 

DETAILED DESCRIPTTON OF THE PREFERRED EMBODIMENTS 

The inventors have discovered a method for identifying secreted proteins or 
polypeptides. Secreted proteins or polypeptides include soluble proteins which can 
be transported across a membrane, such as a cell membrane, nuclear membrane, or 
membrane of the endoplasmic reticulum, as well as proteins wWch can be partially 
secreted from a cell, such as membrane-bound receptors. 

Secreted proteins can contmn a signal (or secretion leader) sequence, 
located at the N-terminus and including at least several hydrophobic amino acids. 
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such as phenylalanine, methionine, leucine, valine, or tiyptophan. Non-hydrophobic 
amino acids can also be included in the signal sequence. Signal sequences are 
described in von Hdjne, J. Mol Biol 184:99-105 (1985) and Kaiser and Botstein, 
Mol Cell Biol 6:2382-2391 (1986). Secreted proteins can also be glycosylated by 
post-translational modification. The presence of a signal sequence or the presence 
of glycosylation or both indicate that a particular protein is a secreted protein. 

In order to identify secreted proteins or polypeptides, the method of the 
invention exploits properties of microsomes, which are the closed vesicles that 
result from fragmentation of endoplasmic reticulum. Microsomes can be rough or 
smooth, depending on whether the endoplasmic reticulum from which they were 
derived is studded with ribosomes. Nficrosomes, particularly rough microsomes, 
have the ability to perform post-translational modifications, such as glycosylation 
and cleavage of signal sequences from proteins or polypeptides. 

To identify secreted proteins, a population of complementary DNA (cDNA) 
molecules is transcribed in vitro to synthesize a population of complementary KNA 
(cRNA) molecules. The cDNA molecules can be synthesized by reverse 
transcription of mRNA molecules isolated from a particular cell or tissue type or 
organism using, for example, a commerdally avsulable reverse transcriptase enzyme. 
Alternatively, the reverse transcription reaction to form cDNA molecules can be 
; '^bndul^ed on total KbTA, without a preliminary purifi^^^ v ^ ^ 

Any organism, such as a bacterium, plant, invertebrate, or vertebrate 
organism, can be used as a source of KNA. Particularly preferred sources of KNA 
are mammals, most preferably humans. Tissues, such as liver, brain, kidney, spleen, 
pancreas, or muscle, can be used as a source of KNA. Individual cell types, either 
primary cells or members of established cell lines, such as HeLa, CHO, PC 12, PI 9, 
BHK, COS, or HepG2, are suitable sources of KNA. Tissues or primary cells 
isolated from organisms at a particular stage in development can be used as KNA 
sources. Stem cells, such as hematopoietic, neuronal, and embryonic stem cells, can 
also be used as a source of KNA. 

Total KNA or mKNA can be isolated using methods known in the art. Such 
methods are described, inter alia^ in Sambrook et a/., Molecular Cloning, A 
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Laboratory Manual (2d ed.. Cold Spring Harbor Press, N.Y., 1989), and 
Ausubel etal^ CinuiEOTPROTOCOl^lNMOLECUlJ^Bro (Greene Publishing 
Associates and John Wiley & Sons, N.Y., 1994). Techniques for RNA isolation 
can be tailored for a particular organism or cell type, as is known in the art. 

Complementary DNA can optionally be obtained from a cDNA library. The 
cDNA library can be derived from the genome of any organism of interest, 
particularly a mammal or a human. Tissue- or cell type-specific cDNA libraries can 
also be used as a source of cDNA. 

Transcription of cDNA molecules in vitro to form cRNA molecules can be 
carried out using any methods known in the art. These methods include, for 
example, placing cDNA into a cloning vector containing a promoter, such as an 
SP6, T7, or T3 polymerase promoter, and transcribing the cDNA using the 
appropriate polymerase. A variety of commercial kits are available for this purpose. 

A first portion of the population of cRNA molecules can be translated in 
vitro, in the absence of rough microsomes, to form a first population of 
polypeptides which have not been post-translationally modified. A second portion 
of the population of cRNA molecules can be translated in vitro in the presence of 
rough microsomes. Under the conditions of the in vitro translation reaction, rough 
microsomes can cleave signal sequences from those polypeptides which comprise 
such sequences. Under the same conditions^ rough microsomes can also glycosylate 
those polypeptides which contain glycosylation sites. 

Methods of in vitro translation are those which are known in the art, such 
as translation in a reticulocyte lysate system, particularly a rabbit reticulocyte lysate. 
Reticulocyte lysate systems can be assembled in the laboratory or purchased 
commercially in kit form. 

Microsomes can be prepared by disruption of tissues or cells by 
homogenization, as is known in the art. If desired, rough and smooth microsomes 
can be separated using well-known techniques, such as sucrose density gradient 
sedimentation. Microsomes are also available commercially, for example, such as 
the canine pancreatic microsomes available from Promega Corp., Madison, WI. 
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The first population of polypeptides can then be compared with the second 
population of polypeptides. This comparison can be by means of, for example, one- 
or two-dimensional polyacrylamide gel electrophoresis, as is known in the art. 
Polypeptides separated in the gels can be detected by any means known in the art, 
such as staining with copper, silver, Coomassie Brilliant Blue, amido black, fast 
green FCF, Ponceau S, or a chromophoric label. Separated proteins can also be 
visualized using radioactive, chemiluminescent, fluorescent, or enzymatic tags 
incorporated into the proteins before separation. 

The gels can be dried or the proteins can be transferred to membranes, such 
as polyvinylidene difluoride membranes. Either the gels or membranes themselves 
or photographs of the gels or membranes can be compared by eye. Alternatively, 
the gels or membranes can be scanned, for example, with a densitometer and 
analyzed with the aid of a computer 

Polypeptide members of the second population of polypeptides, which have 
been modified by the rough microsomes, can be detected by any means available in 
the art. For example, a shift in the position of a polypeptide band can be observed, 
indicating an increase in molecular weight of a member of the second population 
compared with the corresponding polypeptide member of the first population. Such 
an increase in molecular weight indicates that the polypeptide member of the second 
population wias glycosylated by the rough microsomes. - 

A shift in the position of a polypeptide band indicating a decrease in 
molecular weight of a member of the second population compared with the 
corresponding polypeptide member of the first population can also be observed. 
This decrease in molecular weight indicates that the polypeptide member of the 
second population contained a signal sequence which was cleaved by the rough 
microsomes. 

Polypeptides which are modified by the rough microsomes are identified as 
secreted polypeptides. Optionally, quantities of cDNA molecules which encode 
secreted polypeptides can be obtained. Molecules of cDNA which encode 
polypeptides which are post-translationally modified by the rough microsomes can 
be placed into suitable vectors using standard recombinant DNA techniques and 
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used to transform host cells. Many vectors are available for this purpose, such as 
retroviral or adenoviral vectors and bacteriophage, as described below. 

Vectors comprising cDNA which encode secreted polypeptides can be 
introduced into host cells using techniques available in the art. These techniques 
include, but are not linuted to, transferrin-polycation-mediated DNA transfer, 
transfection with naked or encapsulated nucleic acids, liposome-mediated cellular 
fusion, intracellular transportation of DNA-coated latex beads, protoplast fiision, 
viral infection, electroporation, and calcium phosphate-mediated transfection. 

The host cells can be any host cells which are capable of propagating cDNA 
molecules. A variety of host cells, for example immortalized cell lines such as 
HeLa, CHO, or HEK, are avsulable for this purpose. 

Transformed host cells can be diluted serially and cultured to form individual 
colonies. Methods of culturing host cells and the media suitable for each host cell 
type are well known in the art. Preferably, each colony originates from a single 
transformed host cell. Separate preparations of cDNA from each colony can be 
prepared, as described above, and transcribed in vitro to form cRNA. The cKNA 
can be transcribed to form secreted polypeptides, which can be purified as is known 
in the art. If the preparation of secreted polypeptides from a colony contains more 
than one species of polypeptide, the steps described above can be repeated until a 
colony is obtained which contains cDNA encoding only a single species of ^> 
polypeptide. 

Complementary DNA molecules which encode secreted proteins can be 
sequenced using standard nucleotide sequencing techniques. The sequence of each 
cDNA molecule can be compared with known sequences in a database to determine 
whether the clone encodes a known or a novel secreted protein. 

The inventors have used the method of the invention to identify nineteen 
novel human secreted proteins. Amino acid sequences for these nineteen human 
secreted proteins are disclosed in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. Nucleotide sequences which encode the 
proteins are disclosed in SEQIDNOs:!, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, and 19, respectively. 
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Clones containing the cDNAs of the secreted proteins were deposited on 
December 1 1, 1997, with the ATCC. Individual bacterial cells (£ colt) in this 
composite deposit contdn one or more of the polynucleotides encoding the secreted 
proteins of the invention and can be retrieved using an oligonucleotide probe 
designed from the sequence for that particular polynucleotide, as provided herein. 
Each polynucleotide can be removed from the vector by performing an EcoRIZNotl 
digestion (5' site, EcoRI; 3' site, NotI). The deposit submitted to the ATCC has 
been designated SECP 120997. The nucleotide sequences of these deposits and the 
amino acid sequences they encode are controlling in the event of a discrepancy 
between the amino acid and nucleotide sequences disclosed herein and those 
contsuned in the deposits. 

A purified and isolated subgenomic polynucleotide of the present invention 
comprises at least 10, 12, 15, 18, 20, 25, 30, 35, 40, 45, or 50 contiguous 
nucleotides selected from the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. The isolated and purified 
subgenomic polynucleotides c£m comprise an entire nucleotide sequence selected 
from the nucleotide sequences shoAvn in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19. 

Subgenomic polynucleotides contain less than a whole chromosome and are 
, . preferably intron-free. Polynucleotides of the invention can be isolated and purified 
free from other nucleotide sequences by standard nucleic acid purification 
techniques, using restriction enzymes and probes to isolate fragments comprising 
the coding sequences. 

Isolated genes corresponding to the cDNA sequences disclosed herein are 
also provided. ICnown methods can be used to isolate the corresponding genes 
using the provided cDNA sequences. These methods include preparation of probes 
or primers from the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 for use in identifying or amplifying 
the genes from human genomic libraries or other sources of human genomic DNA 

The coding sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19 can be made using reverse transcriptase with 
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human mRNA as a template. Amplification by PCR can also be used to obtain the 
polynucleotides, using either genomic DNA or cDNA as a template. Polynucleotide 
molecules of the invention can also be made using the techniques of synthetic 
chemistry given the sequences disclosed herein. The degeneracy of the genetic code 
permits alternate nucleotide sequences which will encode the amino acid sequences 
shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, and 38 to be synthesized. All such nucleotide sequences are within the 
scope of the present invention. 

Polynucleotide molecules of the invention can be propagated in vectors and 
cell lines as is known in the art. Polynucleotide molecules can be on linear or 
circular molecules. They can be on autonomously replicating molecules or on 
molecules without replication sequences. For propagation, polynucleotides of the 
invention can be introduced into suitable host cells using any techniques available in 
the art, as described above. 

Subgenomic polynucleotides of the invention can be used to propagate 
additional copies of the polynucleotides or to express protein, polypeptides, or 
fusion proteins. The subgenomic polynucleotides disclosed herein can also be used, 
for example, as biomarkers for tissues or chromosomes, as molecular weight 
markers for DNA gels, to elicit inmiune responses, such as the formation of 
antibodies against single- or double-stranded DNA, and in DNA-ligand interaction 
assays, to detect proteins or other molecules which interact with the nucleotide 
sequences. 

Disease states may be associated with alterations in the expression of genes 
which encode proteins of the invention. Polynucleotide sequences disclosed herein 
can also be used to determine the involvement of any of these sequences in disease 
states. For example, a gene in a diseased cell can be sequenced and compared vnth 
a wild-type coding sequence of the invention. Alternatively, nucleotide probes can 
be constructed and used to detect normal or altered (mutant) forms of mRNA in a 
diseased cell. Subgenomic polynucleotides of the invention can also be used to 
design diagnostic tests and therapeutic compositions for diseases which may be 
associated with altered expression of these genes. 
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The present invention provides both fiiU-iength and mature forms of the 
disclosed proteins. Full-length forms of the proteins have the amino acid sequences 
shown in SEQ IDNos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, and 38. The full-length forms of a protein can be processed enzymatically 
to remove a signal sequence, resulting in a mature form of the protein. Signal 
sequences can be identified by examination of the amino acid sequences disclosed 
herein and comparison with amino acid sequences of known signal sequences (see, 

von Heijne, 1985; Kaiser & Botstein, 1986). Similarly, transmembrane 
domains can be identified by examination of the amino acid sequences disclosed 
herein. A transmembrane domsun typically contains a long stretch of 15-30 
hydrophobic amino acids. 

Other domains with predicted functions can also be identified. For sample, 
the protein having the amino acid sequence shown in SEQ ID NO:23 comprises a 
Kunitz type serine protease inhibitor domain spanning amino adds 68 to 122 of 
SEQ ID NO:23. The protein having the amino acid sequence shown in SEQ ID 
NO:20 contains a 2inc-finger motif 

Allelic variants of the disclosed subgenomic polynucleotides can occur and 
encode proteins which are identical, homologous, or substantially related to amino 
acid sequences disclosed herein (see below). 

- " Allelic variants of subgenoniic polynucleotides of the invention can be - 
identified by hybridization of putative allelic variants with nucleotide sequences 
disclosed herein under stringent conditions. For example, by using the following 
wash conditions~2 x SCC, 0.1% SDS, room temperature twice, 30 minutes each; 
then 2 x SCC, 0.1% SDS, 50 °C. once, 30 minutes; then 2 x SCC, room 
temperature twice, 10 minutes each— allelic variants can be identified which contain 
at most about 25-30% basepair mismatches. More preferably, allelic variants 
contain 15-25% basepair mismatches, even more preferably 5-15% basepair 
mismatches. 

Protein variants of secreted proteins of the invention are also included. 
Amino acids which are not involved in regions which determine biological activity 
can be deleted or modified without affecting biological fiinction. Preferably, protein 
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variants of the invention have amino acid sequences which are at least 85%, 90%, 
or 95% identical to the amino acid sequences disclosed herein and have similar 
biological properties (see below). More preferably, the molecules are 98% 
identical. Modifications of interest in the protein sequences can include the 
alteration, substitution, replacement, insertion or deletion of a selected amino add 
residue. Proteins or derivatives can be either glycosylated or unglycosylated. 
Techniques for making such modifications are well known to those skilled in the art 
(see, e.g.^ U.S. 4,518,584). Alternatively, variants of proteins disclosed herein can 
be constmcted using techniques of synthetic chemistry or using recombinant DNA 
methods. 

Preferably, amino acid changes in variants or derivatives of proteins of the 
invention are conservative amino acid changes, i.e., substitutions of similariy 
charged or uncharged amino acids. A conservative amino acid change involves 
substitution of one amino acid for another amino acid of a family of amino acids 
which are structurally related in their side chains. Naturally occurring amino adds 
are generally divided into four families: acidic (aspartate, glutamate), basic (lysine, 
arginine, histidine), non-polar (alanine, valine, leucine, isoleucine, proline, 
phenylalanine, methionine, tryptophan), and uncharged polar (glycine, asparagme, 
glutamine, cystine, serine, threonine, tyrosine) amino acids. Phenylalanine, 
. .. tryptophan, -and tyrosine are sometimes classified as aromatic amino adds. It is - 
reasonable to expect that an isolated replacement of a leudne with an isoleucine or 
valine, an aspartate with a glutamate, a threonine with a serine, or a similar 
replacement of an amino acid with a structurally related amino acid will not have a 
major effect on the binding properties of the resuhing molecule, espedally if the 
replacement does not involve an amino acid at a binding site involved in an 
interaction of the protein, Non-naturally occurring amino acids can also be used to 
form protein variants of the invention. 

Whether an amino acid change results in a functional protein or polypeptide 
can readily be determined by assaying biological properties of the disclosed proteins 
or polypeptides, as described below. Species homologs of human subgenomic 
polynucleotides and proteins of the invention can also be identified by making 
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suitable probes or primers and screening cDNA expression libraries from other 
species, such as mice, monkeys, yeast, or bacteria. 

In the case of proteins which are membrane-bound, such as cell surface 
receptor proteins, soluble forms of the proteins can be obtained by deleting the 
nucleotide sequences which encode part or all of the intracellular and 
transmembrane domains of the protein and expressing a fully secreted form of the 
protein in a host cell. Techniques for identifying intracellular and transmembrane 
domains, such as homology searches, can be used to identify such domains in 
proteins of the invention using amino acid and nucleotide sequences disclosed 
herein. 

Polypeptides consisting of less than fiilMength proteins of the present 
invention are also provided. Polypeptides of the invention can be linear or can be 
cycli2«d, for example, as described in Saragovi e/a/., 1992, Bio/Technology 10, 
773-778 and McDowell etal, 1992, J. Amer. Chem. Soc. 114, 9245-9253. 
Polypeptides can be used, for example, as immunogens, diagnostic aids, or 
therapeutics, and to create fusion proteins, as described below. 

Polypeptide molecules consisting of less than the entire amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38 are also provided. Such polypeptides comprise at least 6, 
- 20 contiguous amino acids of an amino acid sequence shown in 

SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
and 38. Polypeptide molecules of the invention can also possess minor amino acid 
alterations which do not substantially affect the ability of the polypeptides to 
interact with specific molecules, such as antibodies. 

Derivatives of the polypeptides, such as glycosylated forms, aggregative 
conjugates with other molecules, and covalent conjugates with unrelated chemical 
moieties, are also provided. Derivatives also include allelic variants, species 
variants, and muteins. Covalent derivatives are prepared by linkage of 
fiinctionalities to groups which are found in the amino acid chain or at the N- or C- 
terminal residue by means known in the art. Truncations or deletions of regions 
which do not aflfect biological function are also encompassed. Truncated or deleted 
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polypeptides can be prepared sjmthetically or recombinantly, or by proteolytic 
digestion of purified or partially purified secreted proteins of the invention. 

Fusion proteins comprising at least 6, 8, 10, 12, 15, 18, or 20 contiguous 
amino acids of the disclosed proteins can also be constructed. Human fiision 
proteins are useful, inter alia, for generating antibodies against amino acid 
sequences and for use in various assay systems. For example, fiision proteins can 
be used to identify proteins which interact with secreted proteins of the invention 
and influence their fiinction. Physical methods, such as protein affinity 
chromatography, or library-based assays for protein-protein interactions, such as the 
yeast two-hybrid or phage display systems, can be used for this purpose. Such 
methods are well known in the art and can also be used as drug screens. Fusion 
proteins can also be used to target molecules to a specific location in a cell or to 
cause a molecule to be secreted or to be anchored in a cellular membrane. 

Fusion proteins of the invention comprise two protein segments which are 
fused together with a peptide bond. The first protein segment comprises at least 6, 
8, 10, 12, 15, 18, or 20 contiguous amino acids selected fi-om an amino acid 
sequence shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38. The first protein segment can also be a full-length 
protein (comprising a signal sequence) or a mature protein (lacking a signal 
sequence). The second protein segment can be a full-length protein or a protein ' 
firagment. The second protein or protein fragment can be labeled with a detectable 
marker, such as a radioactive, chemiluminescent, biotinylated, or fluorescent tag, or 
can be an enzyme which will generate a detectable product. Enzymes suitable for 
this purpose, such as P-galactosidase, are well known in the art. 

Techniques for making fusion proteins, either recombinantly or by 
covalently linking two protein segments, are well known in the art. Fusion proteins 
comprising amino acid sequences of the invention can also be constructed, for 
example, using standard recombinant DNA methods to make a DNA construct 
which comprises contiguous nucleotides selected from SEQ ID NOs:l, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and encoding the desired aimno 
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acids in proper reading frame with nucleotides encoding the second protein 
segment. 

Proteins or polypeptides of the invention can be purified free from other 
components with which they are normally associated m a cell, such as 
carbohydrates, lipids, subcellular organelles, or other proteins. An isolated protein 
or polypeptide is at least 90% pure. Preferably, the preparations are 95% or 99% 
pure. The purity of a preparation can be assessed, for example, by examining 
electrophoretograms of protein or polypeptide preparations at several pH values 
and at several polyacrylamide concentrations, as is known in the art. 

Standard biochemical methods can be used to isolate proteins of the 
invention from tissues which express the proteins or to isolate proteins, 
polypeptides, or fusion proteins from recombinant host cells into which a DNA 
construct has been introduced. Methods of protein purification, such as size 
exclusion chromatography, ammomum sulfate fractionation, ion exchange 
chromatography, affinity chromatography, crystallization, electrofocusing, or 
preparative gel electrophoresis, are well known and widely used in the art. 

Alternatively, proteins, fusion proteins, or polypeptides of the invention can 
be produced by recombinant DNA methods or by synthetic chemical methods. 
Synthetic chemistry methods, such as solid phase peptide synthesis, can be used to 
: V t ^synthesize proteins^ fiision proteins, or- polypeptides. For.production of . : : 
recombinant proteins, fiision proteins, or polypeptides, coding sequences selected 
from the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19 can be expressed in prokaryotic or eukaryotic 
host cells using expression systems known in the art. These expression systems 
include bacterial, yeast, insect, and mammalian cells (see below). 

The resulting expressed protein can then be purified from the culture 
medium or from extracts of the cultured cells using purification procedures known 
in the art. For example, for proteins fiiUy secreted into the culture medium, cell-free 
medium can be diluted with sodium acetate and contacted with a cation exchange 
resin, followed by hydrophobic interaction chromatography. Using this method, the 
desired protein, fiision protein, or polypeptide is typically greater than 95% pure. 
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Further purification can be undertaken, using, for example, any of the techniques 
listed above. Proteins, fusion proteins, or polypeptides can also be tagged with an 
epitope, such as a 'Tlag" epitope (Kodak), and purified using an antibody which 
specifically binds to that epitope. 

It may be necessary to modify a protein produced in yeast or bacteria, for 
example by phosphorylation or glycosylation of the appropriate sites, in order to 
obtain a functional protein. Such covalent attachments can be made using known 
chemical or enzymatic methods. 

Proteins or polypeptides of the invention can also be expressed in cultured 
cells in a form which will facilitate purification. For example, a secreted protein or 
polypeptide can be expressed as a fusion protein comprising, for example, maltose 
binding protein, glutathione-S-transferase, or thioredoxin, and purified using a 
commercially available kit. Kits for expression and purification of such fiision 
proteins are avsulable fi-om companies such as New England BioLabs, Pharmacia, 
and Invitrogen. 

The coding sequences disclosed herein can also be used to construct 
transgenic animals, such as cows, goats, pigs, or sheep. Female transgenic animals 
can then produce proteins, polypeptides, or fusion proteins of the invention in their 
milk. Methods for constructing such animals are known and widely used in the art. 

' ' Isolated proteins, polypeptides, or fusion proteins of the invention can be ^^'^ 
used to obtain a preparation of antibodies which specifically bind to epitopes 
comprising amino acid sequences of the invention. Antibodies of the invention can 
be used, for example, to detect proteins, polypeptides, or fusion proteins of the 
invention which are secreted into culture medium or to identify tissues or cells 
which express these molecules. The antibodies can be polyclonal or monoclonal or 
can be single chain antibodies. Techniques for raising polyclonal and monoclonal 
antibodies and for constructing single chain antibodies are well known in the art. 

Antibodies of the invention bind specifically to epitopes comprising amino 
acid sequences of the invention, preferably to epitopes not present on other 
proteins. Typically a minimum number of contiguous amino acids to encode an 
epitope is 6, 8, or 10. However, more amino acids can be part of an epitope, for 
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example, at least 15, 25, or 50, especially to form epitopes which involve non- 
contiguous residues. Specific binding antibodies do not detect other proteins on 
Western blots of proteins or in immunoc3^ochemical assays. Specific binding 
antibodies provide a signal at least ten-fold lower than the signal provided with 
epitopes which do not comprise amino acid sequences of the invention. Antibodies 
which bind specifically to secreted proteins of the invention include those that bind 
to mature or fijlUength proteins, to polypeptides or degradation products, to fiision 
proteins, or to protein variants. In a preferred embodiment of the invention, the 
antibodies immunoprecipitate the desired protein, fiision protein, or polypeptide 
fi"om solution and react with the protein, fiision protein, or polypeptide on Western 
blots of polyaciylamide gels. 

Techniques for purifying antibodies are those which are available in the art. 
In a preferred embodunent, antibodies are afi5nity purified by passing the antibodies 
over a column to which anuno acid sequences of the invention are bound. The 
bound antibody is then eluted, for ^cample using a buffer with a high salt 
concentration. Any such technique may be chosen to purify antibodies of the 
invention. 

The invention also pro^ddes DNA constructs, for expressing all or a portion 
of a protein of the invention in a host cell. The DNA construct comprises a 
.v.. w:»promoter which is fiinctional in^he particular host-cell selected. > The skilled. artisan 
can readily select an appropriate promoter firom the large number of cell type- 
specific promoters known and used in the art. The DNA construct can also contain 
a transcription terminator which is fimctional m the host cell. 

The expression construct comprises a polynucleotide segment which 
encodes all or a portion of a human protein encoded by SEQ ID NOs: 1, 2, 3, 4, 5, 
6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, and 19 or a variant thereof The 
polynucleotide segment is located downstream fi*om the promoter. Transcription of 
the polynucleotide segment initiates at the promoter. DNA constructs can be linear 
or circular and can contain sequences, if desired, for autonomous replication. 

The host cell comprising the DNA construct can be any suitable prokaryotic 
or eukaryotic cell. Expression systems in bacteria include those described in Chang 
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et al. Nature (1978) 275: 615; Goeddel et al. Nature (1979) 281: 544; Goeddel et 
al. Nucleic Acids Res. (1980) 8: 4057; EP 36,776; U.S. 4,551,433; deBoer etal, 
Proc. Natl Acad. Sci. USA (1983) 80: 21-25; and SiebenUst etal., Cc//(1980) 20: 
269. 

Expression systems in yeast include those described in Hinnen et al, Proc. 
Natl Acad Sci. USA (1978) 75: 1929; Ito etal.. J. Bacterial (1983) 755: 163; 
Kurtz etal, Mol Cell Biol (1986) 6: 142; Kunze etal, J. Basic Microbiol 
(1985) 25: 141; Gleeson et al. J. Gen. Microbiol (1986) 132: 3459, Roggenkamp 
etal. Mol Gen. Genet. (1986)202 :302); Dasefa/., J. Bacteriol (1984) 158: 
1 165; De Louvencourt et al. J. Bacteriol (1983) 154: 737, Van den Berg et al. 
Bio/Technology (1990) 8: 135; Kunze e/a/., J. Basic Microbiol (1985) 25: 141; 
Cregg etal. Mol Cell Biol (1985) 5: 3376; U.S. 4.837,148; U.S. 4,929,555; 
Beach and Nurse, Nature (1981) 300: 706; Davidow et al, Curr. Genet. (1985) 10: 
380; Gaillardin e/ a/., Curr. Genet. (1985) 70: 49; Ballance et al. Biochem. 
Biophys. Res. Commun. (1983) 772: 284-289; TUbum etal. Gene (1983) 26: 205- 
22;, Yelton et al, Proc. Natl Acad Sci. USA (1984) 81: 1470-1474; KeUy and 
Hynes, EMBO J. (1985) 4: 475479; EP lAA.liA; and WO 91/00357. 

Expression of heterologous genes in insects can be accomplished as 
described in U.S. 4,745,051; Friesen e/a/. (1986) "The Regulation ofBaculovirus 
Gene Expression" in: THE Molecular Biology of Baculoviruses (W. Doerfler, 
ed.); EP 127,839; EP 155,476; Vlak etal. J. Gen. Virol (1988) 69: 765-776; 
Miller etal. Arm. Rev. Microbiol (1988) 42: 177; Carbonell etal. Gene (1988) 
73: 409; Maeda et al. Nature (1985) 575: 592-594; Lebacq-Verheyden et al, Mol 
Cell Biol (1988) 8: 3129; Smith etal, Proc. Natl Acad Sci. USA (1985) 82: 
8404; Miyajima et al. Gene (1987) 58: 273; and Martin et al, DNA (1988) 7:99. 
Numerous baculoviral strsuns and variants and corresponding permissive insect host 
cells from hosts are described in Luckow etal, Bio/Technology (1988) 6: 47-55, 
Miller etal, in GENERIC ENGINEERING (Setlow, J.K. etal eds.). Vol. 8 (Plenum 
Publishing, 1986), pp. 277-279; and Maeda a/.. Nature. (1985) 575: 592-594. 

Mammalian expression can be accomplished as described in Dijkema et al.. 
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EMBOJ. (1985) 4: 761; Gorman etaU Proc. Natl Acad ScL USA (1982b) 79: 
6777; Boshart et al. Cell (1985) 41: 521; and U.S. 4,399,216. Other features of 
mammalian expression can be facilitated as described in Ham and Wallace, Meth 
Enz. (1979) 58: 44; Barnes and Sato, Anal Biochem. (1980) 102: 255; U.S. 
4,767,704; U.S. 4,657,866; U.S. 4,927,762; U.S. 4,560,655; WO 90/103430, WO 
87/00195, and U.S. RE 30,985, 

DNA constmcts of the invention can be introduced into host cells using any 
technique known in the art. These techniques include transferrin-polycation- 
mediated DNA transfer, transfection with naked or encapsulated nucleic acids, 
liposome-mediated cellular fusion, intracellular transportation of DNA-coated latex 
beads, protoplast fiision, viral infection, electroporation, and calcium phosphate- 
mediated transfection. 

Alternatively, expression of an endogenous gene encoding a protein of the 
invention can be manipulated by introducing by homologous recombination a DNA 
construct comprising a transcription unit in frame v^th the endogenous gene, to 
form a homologously recombinant cell comprising the transcription unit. The 
transcription unit comprises a targeting sequence, a regulatory sequence, an exon, 
and an unpaired splice donor site. The new transcription unit can be used to turn 
the^^^e^ TWs method of affe 

gene expression is taught in U.S. 5,641,670, which is incorporated herein by 
reference. 

The targeting sequence is a segment of at least 10, 12, 15, 20, or SO 
contiguous nucleotides selected from the nucleotide sequences shown in SEQ ID 
NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. The 
transcription unit is located upstream to a coding sequence of the endogenous 
gene. The exogenous regulatory sequence directs transcription of the coding 
sequence of the endogenous gene. 

Secreted proteins of the invention have a variety of uses. For example, 
secreted proteins can be used in assays to determine biological activities, such as 
cytokine, cell proliferation, or cellular differentiation acti^dties, tissue growth or 
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regeneration, activin or inhibin activity, chemotactic or chemokinetic activity, 
hemostatic or thrombolytic activity, receptor/ligand activity, tumor inhibition, or 
anti-inflammatory activity. Assays for these activities are known in the art and are 
disclosed, for example, in U.S. 5,654,173, which is incorporated herein by 
reference. 

Proteins of the invention can also be used as biomarkers, to identify tissues 
or cell types which express the proteins, or a stage- or disease-specific alteration in 
protein expression. Proteins of the invention can be used in protein interaction 
assays, to identify ligands or binding proteins. Compounds which afiect the 
biological activities of the secreted proteins or their ability to interact with specific 
ligands can be identified using proteins of the invention in screening assays. 
Proteins and antibodies of the invention can also be used to design diagnostic tests 
and therapeutic compositions for diseases which may be associated with altered 
expression of these proteins. Fusion proteins comprising, for example, signal 
sequences or transmembrane domains of the disclosed proteins, can be used to 
target other protein domains to cellular locations in which the domains are not 
normally found, such as bound to a cellular membrane or secreted extracellularly. 

Further objects, features, and advantages of the present invention will 
readily occur to the skilled artisan provided with the disclosure above. 

S YNQPSTS QF Tm TNVENTIQN 

1. An isolated and purified human protein having an amino acid 
sequence selected fi-om the group consisting of the amino acid sequences shown in 
SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
and 38. 

2. An isolated and purified human protein having an amino acid 
sequence which is at least 85% identical to an amino acid sequence selected firom 
the group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 
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3. The isolated and purified human protein of item 2 wherein the amino 
acid sequence is at least 90% identical. 

4. The isolated and purified human protein of item 2 wherein the amino 
acid sequence is at least 95% identical. 

5. The isolated and purified human protein of item 2 wherein the amino 
add sequence is at least 98% identical. 

6. An isolated and purified human polypeptide comprising at least 6 
contiguous amino acids of an amino acid sequence selected fi-om the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, and 38. 

7. A fiision protein comprising a first protein segment and a second 
protein segment fiised together by means of a peptide bond, wherein the first 
protein segment consists of at least 6 contiguous amino acids selected fi-om the 
group consisting of the amino add sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

8. A preparation of antibodies which specifically bind to the human 
protein of item 1. 

9. The preparation of antibodies of item 8 wherein the antibodies are 
monoclone. 

10. The preparation of antibodies of item 8 wherein the antibodies are 
polyclonal. 

1 1 . The preparation of antibodies of item 8 wherein the antibodies are 
single chain antibodies. 

12. An isolated and purified subgenomic polynucleotide having a 
nucleotide sequence selected firom the group consisting of the nucleotide sequences 
shown in SEQ IDNOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
and 19. 

13. An isolated and purified subgenomic polynucleotide consisting of at 
least 10 contiguous nucleotides of a nucleotide sequence selected fi-om the group 
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consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

14. An isolated gene corresponding to a cDNA sequence selected from 
the group consisting of the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19, 

15 . A DNA construct for expressing all or a portion of a human protein 
having an amino acid sequence selected from the group consisting of the amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 
33, 34, 35, 36, 37, and 38, comprising: 

a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of the human protein, wherein the polynucleotide segment is located downstream 
from the promoter, wherein transcription of the polynucleotide segment initiates at 
or 3' to the promoter. 

16. A host cell comprising a DNA construct comprising: 
a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of a human protein having an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the pormoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter. 

17. A homologously recombinant cell having incorporated therein a new 
transcription initiation unit, wherein the new transcription initiation unit comprises 
in 5' to 3' order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
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consisting of the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19, and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene. 

18. A method of producing a human protein, comprising the steps of: 
growing a culture of a cell comprising a DNA construct compri^g 

(1) a promoter and (2) a polynucleotide segment encoding at least 6 contiguous 
amino acids of a human protein having an amino acid sequence selected from the 
group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from this promoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter; and; 
purifying the protein from the culture. 

19. A method of produdng a human protein, comprising the steps of: 
growing a culture of a homologously recombinant cell having 

incorporated therein a new transcription initiation unit, wherein the new 
transcription initiation unit comprises in S* to 3' order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene; and 
purifying the protein from the culture. 

20. A method of identifying a secreted polypeptide which is modified by 
rough microsomes, comprising the steps of: 

transcribing in vitro a population of cDNA molecules whereby a 
population of cKNA molecules is formed; 
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translating a first portion of the population of cRNA molecules in 
vitro in the absence of rough microsomes whereby a first population of polypeptides 
is formed; 

translating a second portion of the population of cKNA molecules in 
5 vitro in the presence of rough microsomes whereby a second population of 

polypeptides is formed; 

comparing the first population of polypeptides with the second 
population of polypeptides; and 

detecting polypeptide members of the second population which have 
10 been modified by the rough microsomes. 

2 1 . The method of item 20 wherein the population of cDNA molecules 
is synthesized by reverse transcription of a population of mKNA molecules. 

22. The method of item 21 wherein the mRNA molecules are isolated 
firom a mammal. 

IS 23 . The method of item 22 wherein the mRNA molecules are isolated 

firom a human. 

24. The method of item 20 wherein the population of cDNA molecules 
is obtained firom a cDNA library. 

25 . The method of item 24 wherein the cDNA library is derived firom a- 
20 mammalian genome. 

26. The method of item 25 wherein the cDNA library is derived fi"om a 
human genome. 
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(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: ll-DEC-1997 
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(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/032757 

(B) FILING DATE: ll-DEC-1996 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Kagan, Sarah A 

(B) REGISTRATION NUMBER: 32141 

(C) REFERENCE /DOCKET NUMBER: 
2441 . 39505 ; 13 69 . 002 ; 1452 • 001 

(Ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 202-508-9100 

(B) TELEFAX: 202-508-9299 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2063 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

6AATTCGGCA CGAGGCCTCA GTCTTCCAGG GCGGCGGTGG GTGTCCGCTT CTCTCTGCTC 60 

TTCGACTGCA CCGCACTCGC GCGTGACCCT GACTCCCCCT AGTCAGCTCA GCGGTGCTGC 120 

CATGGCGTGG CGGCGGCGCG AAGCCGGCGT CGGGGCTCGC GGCGTGTTGG CTCTGGCGTT 180 

GCTCGCCCTG GCCCTGTGCG TGCCCGGGGC CCGGGGCCGG GCTCTCGAGT GGTTCTCGGC 240 
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CGTGGTAAAC 


ATCGAGTACG 


TGGACCCGCA 


GACCAACCTG 


ACGGTGTGGA 


GCGTCTCGGA 


300 


GAGT6GCCGC 


TTCGGCGACA 


GCTCGCCCAA 


GGAGGGCGCG 


CATGGCCTGG 


TGGGCGTCCC 


360 


GTGGGCGCCC 


GGCGGAGACC 


TCGAGGGCTG 


CGCGCCCGAC 


ACGCGCTTCT 


TCGTGCCCGA 


420 


GCCC6GCG6C 


CGAGGGGCCG 


CGCCCTGGGT 


CGCCCTGGTG 


GCTCGTGGGG 


GCTGCACCTT 


480 


CAAGGACAA6 


GTGCTGGTGG 


CGGCGCGGAG 


GAACGCCTCG 


GCCGTCGTCC 


TCTACAATGA 


540 


6GAGCGCTAC 


GGGAACATCA 


CCTTGCCCAT 


GTCTCACGOG 


GGAACAGGAA 


ATATAGTGGT 


600 


CATTATGATT 


AGCTATCCAA 


AAGGAAGAGA 


AATTTTGGAG 


CTGGTGCAAA 


AAGGAATTCC 


660 


A6TAACGATG 


ACCATAGGGG 


TTGGCACCCX^ 


GCATGTACAG 


GAGTTGATCA 


GCGGTCAGTC 


720 


TGTGGTGTTT 


GTGGCCATTG 


CCTTCATCAC 


CATGATGATT 


ATCTCGTTAO 


CCTGGCTAAT 


780 


ATTTTACTAT 


ATACAGCGTT 


TCCTATATAC 


TGGCTCTCAG 


ATTGGAAGTC 


AGAGCCATAG 


840 


AAAAGAAACT 


AAGAAAGTTA 


TTGGCCA6CT 


TCTACTTCAT 


ACTGTAAAGC 


ATGGAGAAAA 


900 


GGGAATTGAT 


GTTGATGCTG 


AAAATTGTGC 


AGTGTGTATT 


GAAAATTTCA 


AAGTAAAGGA 


960 


TATTATTAGA 


ATTCTGCCAT 


GCAAGCATAT 


TTTTCATAGA 


ATATGCATTG 


ACCCATGGCT 


1020 


TTTGGATCAC 


CGAACATGTC 


C7UVTGTGTAA 


ACTTGATGTC 


ATCAAAGCCC 


TAGGATATTG 


1080 


GGGAGAGCCT 


G6GGATGTAC 


AGGAGATGCC 


TGCTCCAGAA 


TCTCCTCCTG 


GAAGGGATCC 


1140 


AGCTGCAAAT 


TTGAGTCTAG 


CTTTACCAGA 


TGATGACGGA 


AGTGATGACA 


GCAGTCCACC 


1200 


ATCAGCCTCC 


CCTGCTGAAT 


CTGAGCCACA 


GTGTGATCCC 


AGCTTTAAAG 


GAGATGCAGG 


1260 


AGAAAATACG 


GCATTGCTAG 


AAGCCGGCAG 


GAGTGACTCT 


CGGCATGGAG 


GACCGATCTC 


1320 


CXAGCACACG 


TGCCCACTGA 


AGTGGCACCA 


ACAGAAGTTT 


66CTTGAACT 


AAAGGAGATT 


1380 


TTATTTTTTT 


TACTTTAGCA 


CATAATTTGT 


ATATTTGAAA 


ATAATGTATA 


TTATTTTACC 


1440 


TATTAGATTC 


TGATTTGATA 


TACAAAGGAC 


TAAGATATTT 


TCTTCTTGAA 


GAGACTTTTC 


1500 


GATTAGTCCT 


CATATATTTA 


TCTACXAAAA 


TAGAGTGTTT 


ACCATGAACA 


GTGTGTT6CT 


1560 


TCAGACTATT 


ACAAAGACAA 


CTGGGGCAGG 


TACTCTAATA 


TAAAG6ACAG 


GTGG.TGTTTC ^ 


1620 


TAAATAATTG 


GCTGCTATGG 


TTCTGTAAAA 


ACCAGTTAAT 


TCTATTTTTC 


AAGGTTTTTG 


1680 


GCAAAGCACA 


TCAATGTTAG 


ACTAGTTGAA 


GTGGAATTGT 


ATAATTCAAT 


TCX5ATAATTG 


1740 


ATCTCATGGG 


CTTTCCCTG6 


AGGAAAGGTT 


TTTTTTGTTG 


TTTTTTTTTT 


AAGAACTTGA 


1800 


AACTTGTAAA 


CTGAGATGTC 


TGTAGCTTTT 


TTGCCCATCT 


GTAGTGTATG 


TGAAGATTTC 


1860 


AAAACCTGAG 


AGCACTTTTT 


CTTTGTTTAG 


AATTATGAGA 


AAGGCACTAG 


ATGACTTTAG 


1920 


GATTTGCATT 


TTTCCCTTTA 


TTGCCTCATT 


TCTTGTGACG 


CCTTGTTGGG 


GAGGGAAATC 


1980 


TGTTTATTTT 


TTCCTACAAA 


TAAAAAGCTA 


AGATTCTATA 


TCGCA7UVAAA 


AAAAAAAAAA 


2040 


AAAAAAAAIU^ 


TTCCTGCGGC 


OGC 








2063 



(2) INFORMATION FOR SEQ Id NOs2: 

(1) SEQUENCE CHARACTERISTICS s 
(A) LENGTH: 1328 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

6AATT06GCA CGAGGTA6GC AAGGOATA2UV AAG6CACCTA AGGCCCTTTT GCAATAA6AA 60 

6CCA6ATGGA TAAA66AAGT GCTCGTCACC CT6GAGGTGT ACT6GTTTGG 6GAA6GTCCC 120 

CG6CCCCCAC AGCCCTCTGG GGAGCCTCAC CCTOGCTCTC CCCACTCACC TCAGCCCTCA 180 

6GCAGCCCCT CCACAGGGCC CCTCTCCTCC CTG6ACAGCT CTCCTG6TCT CCCC6TCCCC 240 

T6GAGAA6AA CAAGGCCAT6 GGTC6GCCCC TGCTGCTOCC CCTGCTGCTC CTCCT6CAGC 300 

CGCCAGCATT TCTGCAGCCT GGTGGCTCCA CAGGATCTGG TCCAAGCTAC CTTTATGGG6 360 

TCACTCAACC AAAACACCTC TCAGCCTCCA TGGGTGGCTC TGTGGAAATC CCCTTCTCCT 420 

TCTATTACCC CTGGGAGTTA GCCATAGTTC CCAACGTGAG AATATCCTGG AGACGGGGCC 480 

ACTTCCACGG GCAGTCCTTC TACAGCACAA GGCCGCCTTC CATTCACAAG GATTATGTGA 540 

ACCGGCTCTT TCT6AACTGG ACAGAGGGTC AGGAGAGCGG CTTCCTCAGG ATCTCAAACC 600 

TGCG6AA66A GGACCAGTCT GTGTATTTCT 6CCGAGTCGA GCTGGACACC CGGA6ATCA6 660 

GGAGGCAGCA GTT6CAGTCC ATCAAG66GA CCAAACTCAC CATCACCCA6 GCTGTCACAA 720 

CCACCACCAC CTGGAGGCCC AGCAGCACAA CCACCATAGC CGGCCTCAGG GTCACAGAAA 780 

6CAAAGGGCA CTCAGAATCA T6GCACCTAA 6TCTGGACAC TGCCATCAGG 6TTGCATTG6 840 

CT6TC6CTGT GCTCAAAACT GTCAXTTTO6 GACT6CTGTG CCTCCTCCTC CTOTG6T6GA 900 
66AGAAGGAA AG6TAGCAGG GC6CCAAGCA 6TOACTTCTG ACCAACAGA6 TGTG6G6AGA ^ 960^' 

AGGGATGTGT ATTAGCCCCG GAGGACGTGA TGTGAGACCC GCTTGTGAGT CCTCCACACT 1020 

CGTTCCCCAT TGGCAAGATA CATGGAGAGC ACCCTGAGGA CCTTTAAAAG GCAAAGCCGC 1080 

AAGGCAG7VAG GAGGCTGGGX CCCTGAATCA CCGACTGGAG GAGAGTTACC TACAAGAGCC 1140 

TTCATCCAGG AGCATCCACA CTGCAATGAT ATAGGAATGA GGTCTGTUVCT CCACTGAATT 1200 

AAACCACTGG CATTTGGGGG CTGTTTATTA TAGCAGTGCA AAGAGTTCCT TTATCCTCCC 1260 

CAAGGATGGA AAAATACAAT TTATTTTGCT TACCATAAAA AAAAAAAAAA AAAAATXCCT 1320 

GCGGCCGC 1328 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1689 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOi3s 



GAATTCGGCJl 


CGAGGGCAAG 


ATTCGATACA 


AAACCAATGA 


ACCTGTGT6G 


GAGGAAAACT 


60 


TCACTTTCTT 


CATTCACAAT 


CCCAAGCGCC 


AGGACCTTGA 


AGTTGAGGTC 


AGAGACGAGC 


120 


A6CACCA6T6 


TTCCCTGGGG 


AACCTGAAGG 


TCCCCCTCAG 


CCAGCTGCTC 


ACCAGTGAG6 


180 


ACATGACTGT 


GAGCCAGCGC 


TTCCAGCTCA 


GTAACTCGGG 


TCCAAACAGC 


ACCATCAAGA 


240 


T6AAGATT6C 


CCTGCGGGTG 


CTCCATCTCG 


AAAAGCGAGA 


AAGGCCTCCA 


6ACCACCAAC 


300 


ACTCAGCTCA 


AGTCAAACGT 


CCCTCTGTGT 


CCAAAGAGGG 


GAGGAAAACA 


TCCATCAAAT 


360 


CTCATATGTC 


TGGGTCTCCA 


GGCCCTGGTG 


GCAGCAACAC 


AGCTCCATCC 


ACACCAGTCA 


420 


TTGGGGGCAG 


TGATAAGCCT 


GGTATGGAAG 


AAAAGGCCCA 


GCCCCCTGAG 


GCCGGCCCTC 


480 


AGGGGCTGCA 


CGACCTGGGC 


AGAAGCTCCT 


CCAGCCTCCT 


GGCCTCCCCA 


GGCCACATCT 


540 


CAGTCAA6GA 


GCCGACCCCC 


AGCATCGCCT 


CGGACATCTC 


GCTGCCCATC 


GCCACCCAGG 


600 


AGCTGCGGCA 


AAGGCTGAGG 


CAGCTGGAAA 


ACGGGACGAC 


CCTGGGACAG 


TCTCCACTGG 


660 


GGCAGATCCA 


GCTGACCATC 


CGGCACAGCT 


CGCAGAGAAA 


CAAGCTTATC 


GTGGTCGTGC 


720 


ATGCCTGCAG 


AAACCTCATT 


GCCTTCTCTG 


AAGACGGCTC 


TGACCCCTAT 


GTCCGCATGT 


780 


ATTTATTACC 


AGACAAGAGG 


CGGTCAGGAA 


GGAGGAAAAC 


ACACGTGTCA 


AAGAAAACAT 


840 


TAAATCCAGT 


GTTTGATCAA 


AGCTTTGATT 


TCAGT6TTTC 


GTTACCAGAA 


GTGCAGAGGA 


900 


6AACGCTCGA 


CGTTGCCGTG 


AAGAACAGTG 


GCGGCTTCCT 


GTCCAAAGAC 


AAAGGGCTCC 


960 


"TTGGCAAAGT ATTGGTTGCT. 


CTGfGCATCTG 


AAGAACTTGC 


CAAAGGCTGG ACCCAQTGGT 


■1020 


ATGACCTCAC 


GG7VAGATGGG 


ACGA6GCCTC 


AGGCGATGAC 


ATAGCCGCAG 


CAGGCAGGAG 


1080 


GCGTCCTCTT 


CAGCGTAGCT 


CTCCACCTCT 


ACCCGGAACA 


CACCCTCTCA 


CAGACGTACC 


1140 


AATGTTATTT 


TTATAATTTC 


ATGGATTTAG 


TTATACATAC 


CTTAATAGTT 


TTATAAAATT 


1200 


GTTGACATTT 


CAGGCAAATT 


TGGCCAATAT 


TATCATTGAA 


TTTTCTGTGT 


TGGATTTCCT 


1260 


CTAGGATTTC 


GCCAGTTCCT 


ACAACGTGCA 


GTAGGGCGGC 


GGTAGCTCTT 


GTGTCTGTGG 


1320 


ACTCT6CTCA 


GCTGTGTCCG 


TAGGAGTCGG 


ATGTGTCTGT 


GCTTTATTAT 


GGCCTTGTTT 


1380 


ATATATCACT 


GAGGTATACT 


ATGCCAT6TA 


AATAGACTAT 


TTTTTATAAT 


CTTAACATGC 


1440 


TGGTTTAAAT 


TCAGAAGGAA 


ATAGATCAAG 


GAAATATATA 


TATTTTCTTC 


TAAAACTTAT 


1500 


TAAATTCGTG 


TGACAAATAA 


TCATTTTCAT 


CTTGGCAGCA 


AAAAGTTCTC 


AGTGACCTAT 


1560 


TTTGTGGTGT 


TTCTTTTTGA 


AAAGAAAAGC 


TGAAATATTA 


TTAAATGCTA GTATGTTTCT 


1620 


GCCCATTATG 


AAAGATGAAA 


TAAAGTATTC 


AAAATATTAA 


AAAAAAAAAA 


AAAAAATTCC 


1680 


TGCGGCCGC 












1689 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOIiOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



GAATTC66CA 


CGAGGAGCAG 


ATCT6CAA6A 


GTTTCGTTTA. 


TGGA6GCTGC 






A6AACAACTA 


CCTTCG6GAA 


GAAGAGTGCA 


TTCTAGCCTG 


TCGGGGT6T6 


CAAGGTCSOOC 


120 

A AW 


CTTTGAGAGG 


CAGCTCTGGG 


GCTCAGGCGA 


CTTTCCCCCA 


6GGCCCCTCC 


ATG6AAAGGC 


180 


GCCATCCAGT 


GTGCTCTGGC 


ACCTGTCAGC 


CCACCCAGTT 


CCGCTGCAGC 


AATGGCTGCT 


240 


GCATCGACAG 


TTTCCTGGAG 


TGTGACGACA 


CCCCCAACTG 


CCCCGACGCC 


TCCGACGAGG 


300 


CTGCCTGTGA 


AAAATACACG 


AGTGGCTTTG 


ACGAGCTCCA 


GCGCATCCAT 


TTCCCCAGCG 


360 


ACAAAGGGCA 


CTGCGTGGAC 


CTGCCAGACA 


CAGGACTCTG 


CAAGGAGAGC 


ATCCCGCGCT 


420 


G6TACTACAA 


CCCCTTCAGC 


GAACACTGCG 


CCCGCTTTAC 


CTATGGTGGT 


TGTTACGGCA 


480 


ACAAGAACAA 


CTTTGAGGAA 


GAGCAGCAGT 




X X W X 






A66ATGTGTT 


TGGCCTGAGG 


CGGGAAATCC 


CCATTCCCA6 


CACAGGCTCT 


GTGGAGATG6 


600 


CT6TCGCAGT 


OTTCCTGGTC 


ATCT6CATTG 


TGGTGGTGGT 


A6CCATCTTG 


GGTTACT6CT 


660 


TCTTCAAC^Ul" CCAGAGAAAG ' GACTTCCACG 


6ACACCACCA 


CCACCCACCA 


CCCACCCCT6 - 




CCAGCTCCAC 


TGTCTCCACT 


ACCGAGGACA 


OGGAGCACCT 


6GTCTATAAC 


CACACCACGC 


780 


GGCCCCTCTG 


AGCCTGGGTC 


TCACCGGCTC 


TCACCTG6CC 


CTGCTTCCTG 


CTT6CCAAGG 


840 


CAGAGGCCTG 


6GCTGGGAAA 


AACTTTGGAA 


CCAGACTCTT 


GCCTGTTTCC 


CAGGCCCACT 


900 


GTGCCTCAGA 


6ACCAGGGCT 


CCAGCCCCTC 


TTGGAGAAGT 


CTCAGCTAAG 


CTCACGTCCT 


960 


GAGAAAGCTC 


AAAGGTTTGG 


AAGGAGCAGA 


AAACCCTTGG 


GCCAGAAGTA 


CCAGACTAGA 


1020 


TGGACCTGCC 


TGCATAGGAG 


TTTGGAGGAA 


GTTGGAGTTT 


TGTTTCCTCT 


GTTCAAAGCT 


1080 


GCCTGTCCCT 


ACCCCATGGT 


GCTAGGAAGA 


GGAGTGGGGT 


GGTGTCAGAC 


CCTGGAGGCC 


1140 


CCAACCCTGT 


CCTCCCGAGC 


TCCTCTTCCA 


TGCTGTGCGC 


CCAGGGCTGG 


GAGGAAGGAC 


1200 


TTCCCTGTGT 


AGTTTGTGCT 


GTAAAGAGTT 


GCTTTTTGTT 


TATTTAATGC 


TGTGGCATGG 


1260 


GT6AAGAGGA 


GGGGAAGAGG 


CCTGTTTGGC 


CTCTCTATCC 


TCTCTTCCTC 


TTCCCCCAAG 


1320 


ATTGAGCTCT 


CTGCCCTTGA 


TCAGCCCCAC 


CCTGGCCTA6 


ACCAGCAGAC 


AGAGCCAGGA 


1380 


6AAGCTCAGC 


TGCATTCCGC 


AGCCCCCACC 


CCCAAGGTTC 


TCCAACATCA 


CAGCCCAGCC 


1440 


CGCCCACTGG 


GTAATAAAAG 


TGGTTTGTGG 


AAAAAAAAAA 


AAAAAAAAAA 


AAGTCCTGCG 


1500 
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6COGC 



1505 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAATTCGGCA CGAGGGCCAT GGCCGGGCTA TCCCGCGGGT CCGCGCGCGC ACTGCTCGCC 60 

GCCCTGCTGG CGTCGACGCT GTTGGCGCTG CTCGTGTCGC CCGCGCGGGG TCGCGGCGGC 120 

CGGGACCACG GGGACTGGGA CGAGGCCTCC CGGCTGCCGC CGCTACCACC CCGCGAGGAC 180 

GCGGCGCGCG TGGCCCGCTT CGT6ACGCAC GTCTCCGACT GGGGCGCTCT 6GCCACCATC 240 

TCCACGCTG6 AGGCGGTGCG CGGCCGGCCC TTCGCCGACG TCCTCTCGCT CA6CGACGGG 300 

CCCCCG6GC6 C6GGCAGCGG CGTGCCCTAT TTCTACCTGA GCCCGCTGCA GCTCTCC6TG 360 

AGCAACCTGC AGGAGAATCC ATATGCTACA CTGACCAT6A CTTTGGCACA GACCAACTTC 420 

T6CAAGAAAC ATGGATTTGA TCCACAAAGT CCCCTTTGTG TTCACATAAT 6CT6TCAG6A 480 

ACTGTGACCA AGGTGAAT6A AACA6AAATG GATATTGCAA A6GATTCGTT ATTCATTCGA 540 

CACdCTGAGA TGAAAACdT& GCCTTCCAGC GATAATTGGT TCTTTGCTAA GTTGMVTATA 600 

ACCAATATCT GGGTCCTGGA CTACTTTGGT G6ACCAAAAA TCGT6ACACC AGAAGAATAT 660 

TATAATGTCA CAGTTCAGTG AAGCAGACTG T6GTGAATTT A6CAACACTT ATGAAGTTTC 720 

TTAAAGTGGC TCATACACAC TTAAAAGGCT TAATGTTTCT CTGGAAAGCG TCCCAGAATA 780 

TTAGCCAGTT TTCTGTCACA TGCTGGTTTG TTTGCTTGCT TGTTTACTTG CTTGTTTACC 840 

AATAGAGTTG ACCTGTTATT GGATTTCCTG GAAGATGTGG TAGCTACTTT TTTCCTATTT 900 

TGAAGCCATT TTCGTAGAGA AATATCCTTC ACTATAATCA AATAAGTTTT GTCCCATCAA 960 

TTCCAAAGAT GTTTCCAGTG GTGCTCTTGA AGAGGAATGA GTACCAGTTT TAAATTGCCC 1020 

ATTGGCATTT GAAGGTAGTT GAGTATGTGT TCTTTATTCC TAGAAGCCAC TGTGCTTGGT 1080 

AGAGTGCATC ACTCACCACA GCTGCCTCTT GAGCTGCCTG AGCCTGGTGC AAAAGGATTG 1140 

GCCCCCATTA TGGTGCTTCT GAATAAATCT T6CCAAGATA GACAAACAAT GATGAAACTC 1200 

AGATGGAGCT TCCTACTCAT GTTGATTTAT GTCTCACAAT CCTGGGTATT GTTAATTCAA 1260 

CATAGGGTGA AACTATTTCT GATAAAGAAC TTTTGAAAAA CTTTTTATAC TCTAAAGTGA 1320 

TACTCAGAAC AAAAGAAAGT CATAAAACTC CTGAATTTAA TTTCCCCACC TAAGTCGAGA 1380 
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CAGTATTATC 


AAAACACATG 


TGCACACAGA 


TTATTTTTTG 


GCTCCAAAAC 


TGGATTGCAA 


1440 


AAGAAAGAG6 


AGAGATATTT 


TGTGTGTTCC 


TGGTATTCTT 


TTATAAGTAA 


AGTXACCCAG 


1500 


GCAT6GACCA 


GCTTCAGCCA 


GGGACAAAAT 


CCCCTCCCAA 


ACCACTCTCC 


ACAGCTTTTT 


1560 


AAAAATACTT 


CTACTCTTAA 


CAATTACCTA 


AGGTTCCTTC 


AAACCCCCCC 


AACTCTTAAT 


1620 


AGCTTCTAGT 


GCTGCTACAA 


TCTAAGTCAG 


GTCACCAGAG 


GGAAGAGAAC 


ATGGCATTAA 


1680 




X w X X ^Avr AAVv 


A^AAv Aw AV** X 


AAXAX X AX XA 


Www A X A X AWA 


XVvAX X xCAGA 


1740 


AGATGACATA 


A6ATTCCTCT 


TAAAGAGGAA 


ATGTCAGGAA 


TCAAGCCACT 


GAATCCTTAA 


1800 


AGAGAAAAGT 


TGAATATGAG 


TCATTGTGTC 


TGAAAACTGC 


AAAGTGAACT 


TAACTGA6AT 


1860 


CCAGGAAACA 


GGTTCTGTTT 


AAGAAAAATA 


ATTTATACTA 


AATTTAGTAA 


AATGGACTTC 


1920 


TTATTCAAAG 


CATCAATAAT 


TAAAAGAATT 


ATTTTAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


1980 


AAAAAAAAAT 


TCCTGCGGCC 


GC 








2002 



(2) INFORMATION FOR SEQ ID NOs6x 

(1) SEQXnSNCE CHARACTERISTICS: 

(A) LENGTH: 1322 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

















GAATTCGGCA 


CGAGGGCCAC 


GACTCTGCTG 


GCATTTCTTC 


TATAGCCACT 


GGAATCT6AT 


60 


CCTGATTGTC 


TTCCACTACT 


ACCAGGCCAT 


CACCACTCC6 


CCTG66TACC 


CACCCCAG6G 


120 


CAGGAATGAT 


ATCGCCACOG 


TCTCCATCTG 


TAAGAAGTGC 


ATTTACCCCA 


AGCCAGCCCG 


180 


AACACACCAC 


T6CAGCATCT 


GCAACAGGTG 


TGTGCTGAAG 


ATG6ATCACC 


ACTGCCCCT6 


240 


GCTAAACAAT 


TGTGTGGGCC 


ACTATAACCA 


TCGGTACTTC 


TTCTCTTTCT 


GCTTTTTCAT 


300 


GACTCTGGGC 


TGTGTCTACT 


GCAGCTATGG 


AAGTTGGGAC 


CTTTTCCGGG 


AGGCTTATGC 


360 


TGCCATTGAG 


AAAATGAAAC 


AGCTCGACAA 


GAACA7VACTA 


CAGGCGGTTG 


CCAACCAGAC 


420 


TTATCACCAG 


ACCCCACCAC 


CCACCTTCTC 


CTTTCGAGAA 


AGGATGACTC 


ACAAGAGTCT 


480 


TGTCTACCTC 


TGGTTCCTGT 


GCAGTTCTGT 


GGCACTTGCC 


CTGGGTGCCC 


TAACTGTATG 


540 


GCATGCT6TT 


CTCATCAGTC 


GAGGTGAGAC 


TAGCATCGAA 


AGGCACATCA ACAAGAAGGA 


600 


GAGACGTCGG 


CTACAGGCCA 


AGGGCAGAGT 


ATTTAGGAAT 


CCTTACAACT 


ACGGCTGCTT 


660 


GGACAACTGG 


AAGGTATTCC 


TGGGTGTGGA 


TACAGGAAGG 


CACTGGCTTA 


CTCGGGTGCT 


720 


CTTACCTTCT 


ACTCACTTGC 


CCCATGGGAA 


TGGAATGAGC 


TGGGAGCCCC 


CTCCCTGGGT 


780 
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GACTGCTCAC 


TCAGCCTCTG 


TGATGGCAGT 


GTGAGCTGGA 


CTGTGTCAGC 


CACXSACTCGA 


840 


GCACTCATTC 


TGCTCCCTAT 


GTTATTTCAA 


GGGCCTCCAA 


GGGCAGCTTT 


TCTCAGAATC 


900 


CTTGATCAAA 


AAGAGCCAGT 


GGGCCTGCCT 


TAGGGTACCA 


TGCAGGACAA 


TTCAAGGACC 


960 


AGCCTTTTTA 


CCACTGCAGA 


AGAAAGACAC 


AATGTGGAGA 


AATCTTAGGA 


CTGACATCCC 


1020 


TTTACTCAGG 


CAAACAGAAG 


TTCCAACCCC 


AGACTAGGGG 


TCAGGCAGCT 


AGCTACCTAC 


1080 


CTTGCCCAGT 


GCTGACCCGG 


ACCTCCTCCA 


GGATACAGCA 


CTGGAGTTGG 


CCACCACCTC 


1140 


TTCTACTTGC 


TGTCTGAAAA 


AACACCTGAC 


TAGTACAGCT 


GAGATCTTGG 


CTTCTGAACA 


1200 


GGGCAAA6AT 


ACCAGGCCTG 


CTGCTGAGGT 


CACTGCCACT 


TCTCACATGC 


TGCTTAAGGG 


1260 


AGCACAAATA 


AAGGTATTCG 


ATTTTTAAAA 


AAAAAAAAAA 


AAAAAAAAAT 


TCCTGC6GCC 


1320 


GC 












1322 



(2) INFORMATION FOR SEQ ID NO:7s 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAATTCG6CA-CGAGGAGCCT GCCTTCATCT ^AGGATGGCTC CTCTGGGCAT GCTGGTTGGG 60 

CTGCTGATGG CCGCCTGCTT CACCTTCTGC CTCAGTCATC AGAACCTGAA GGAGTTTGCC 120 

CTGACCAACC CAGA6AAGAG CA6CACCAAA 6AAACAGAGA GAAAAGAAAC CAAA6CCGAG 180 

GAGGAGCTGG ATGCCGAAGT CCTGGAGGTG TTCCACCCGA CGCATGAGTG GCAGGCCCTT 240 

CAGCCAGGGC AGGCTGTCCC TGCAGGATCC CACGTACGGC TGAATCTTCA GACTGGGGAA 300 

AGAGAGGCAA AACTCCAATA TGAGGACAAG TTCCGAAATA ATTTGAAAGG CAAAAGGCTG 360 

GATATCAACA CCAACACCTA CACATCTCAG GATCTC2UVGA GTGCACTGGC AAAATTCAAG 420 

GAGGGGGCAG AGATGGAGAG TTCAAAGGAA GACAAGGCAA GGCAGGCTGA GGTAAAGCGG 480 

CTCTTCC6CC CCATTGAGGA ACTGAAGAAA GACTTTGATG A6CTGAATGT TGTCATTGAG 540 

ACTGACATGC AGATCATGGT ACGGCTGATC AACAAGTTCA ATA6TTCCAG CTCCAGTTT6 600 

OAAGAGAA6A TTGCTGCGCT CTTT6ATCTT 6AATATTATG TCCATCA6AT GGACAATGCG 660 

CAG6ACCTGC TTTCCTTTGG TGGTCTTCAA 6TGGTGATCA ATGGGCTGAA CAGCACAGAG 720 

CCCCTCGTGA AGGAGTATGC TGCGTTTGTG CTGGGCGCTG CCTTTTCCAG CAACCCCAAG 780 

GTCCAGGTGG AGGCCATCGA AGGGGGAGCC CTGCAGAAGC TGCTGGTCAT CCTGGCCACG 840 



34 



BNSDOCiD:<WO 9825959A2 I > 



wo 98/25959 




PCT/US97/22787 



6AGCAGCCGC 


TCACTGCAAA 


GAAGAAGGTC 


CTGTTTGCAC 


TGTGCTCCCT 


GCTGCGCGAC 


900 


TTCCCCTATG 


CCCAGCGGCA 


GTTCCTGAAG 


CTCGGGGGGC 


TGCAGGTCCT 


GAGGACCCTG 


960 


GTGCAGGAGA 


AGGGCACGGA 


GGTGCTCGCC 


GTGCGCGTGG 


TCAC2ACTGCT 


CTACGACCTG 


1020 


GTCACGGAGA 


AGATGTTCGC 


CGAGGAGGAG 


GCTGAGCTGA 


CCCAGGAGAT 


GTCCCCAGAG 


1080 


AAGCTGCAGC 


AGTATCGCCA 


GGTACACCTC 


CTGCCAGGCC 


TGTGGGAACA 


GGGCTGGTGC 


1140 


GAGATCAOGG 


CCCACCTCCT 


GGCGCTGCCC 


GAGCATGATG 


CCCGTGAGAA 


GGTGCTGCA6 


1200 


ACACTGGGCG 


TCCTCCTGAC 


CACCTGCCGG 


GACCGCTACC 


GTCAGGACCC 


CCAGCTCGGC 


1260 


AGGACACTGG 


CCAGCCTGCA 


6GCTGAGTAC 


CAGGTCCTGG 




CCTGCAGGAT 

WW X WwAVwX^X 




GGTGAGGACO 


AGGGCTACTX 


CCAGGAGCTG 


CTGGGCTCTG 


TCAACAGCTT 


6CT6AAGGAG 


1380 


CTGAGATGA6 


GCCCCACACC 


AGGACTGGAC 


TGGGATGCX!G 


CTAGTGAGGC 


TGAGGG6TGC 


1440 


CAGCGTGGGT 


GGGCTTCTCA 


GGCAGGAGGA 


CATCTTGGCA 


GTGCTGGCTT 


GGCCATTAAA 


1500 


TGGAAACCTG 


AAGGCCAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


1560 


TTCCTGCGGC 


CGC 










1573 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1185 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



GAATTCGGCA 


CGAGGGGGCT 


TTAAGGGACA 


GCTGAGCCG6 


CAGGTGGCA6 


ATCAGAT6T6 


60 


GCAGGCTGGG 


AAAAGACAAG 


CCTCCAGGGC 


CTTCAGCTTG 


TACGCCAACA 


TCGACATCCT 


120 


CAGACCCTAC 


TTTGATGTGG 


AGCCTGCTCA 


GGTGCGAAGC 


AGGCTCCTGG 


AGTCCATGAT 


180 


CCCTATCAAG 


ATGGTCAACT 


TCCCCCAGAA 


AATTGCAGGT 


GAACTCTATG 


GACCTCTCAT 


240 


GCTGGTCTTC 


ACTCTGGTTG 


CTATCCTACT 


CCATGGGATG 


AAGACGTCTG 


ACACTATTAT 


300 


CCGGGAGGGC 


ACCCTGATGG 


GCACAGCCAT 


TGGCACCTGC 


TTCGGCTACT 


GGCTGGGAGT 


360 


CTCATCCTTC 


ATTTACTTCC 


TTGCCTACCT 


GTGCAACGCC 


CAGATCACCA 


TGCTGCAGAT 


420 


GTTGGCACTG 


CTGGGCTATG 


GCCTCTTTGG 


GCATTGCATT 


GTCCTGTTCA 


TCACCTATAA 


480 


TATCCACCTC 


CAOGCCCTCT 


TCTACCTCTT 


CTGGCTGTTG 


GTGGGTGGAC 


TGTCCACACT 


540 


GCGCATGGTA 


GCAGTGTTGG 


TGTCTCGGAC 


CGTGGGCCCC 


ACACAGCGGC 


TGCTCCTCTG 


600 


TGGCACCCTG 


GCTGCCCTAC 


ACATGCTCTT 


CCTGCTCTAT 


CTGCATTTTG 


CCTACCACAA 


660 
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A6TGGTAGA6 


GGGATCCTGG 


ACACACTGGA 


GG6CCCCAAC 


ATCCCGCCCA 


TCCAGAGGGT 


720 


CCCCAGA6AC 


ATCCCTGCCA 


TGCTCCCTGC 


TGCTCGGCTT 


CCCACCACCG 


TCCTCAACGC 


780 


CACAGCCAAA 


GCTGTTGC6G 


TGACCCTGCA 


GTCACACTGA 


CCCCACCTGA 


T^TTCTTGGC 


840 


CA6TCCTCTT 


TCCCGCAGCT 


GCAGAGAGGA 


GGAAGACTAT 


TAAAGGACAG 


♦pecTGA'pfitAr' 

X W w JL V9n X \y A 




AT6TTTC6TA 


GATGGGGTTT 


GCAGCTGCCA 


CTGAGCTGTA 


GCTGCGTAAG 


TACCTCCTTG 


960 


ATGCCTGTOG 


GCACTTCTGA 


AAGGCACAAG 


GCCAAGAACT 


CCTGGCCA6G 


ACTGCAAGGC 


1020 


TCTGCAGCCA 


ATGCAGAAAA 


TGGGTCAGCT 


CCTTT6AGAA 


CCCCTCCCCA 


CCTACCCCTT 


1080 


CCTTCCTCTT 


TATCTCTCCC 


ACATTGTCTT 


GCTAAATATA 


GACTTGGTAA 


TTAAAATGTT 


1140 


GATTGAAGTC 


TGGAAAAAAA 


AAAAAAAAAA 


AATTCCTGCG 


6C06C 




1185 



(2) INFORMATION FOR SEQ ID NOx9s 

(i) iSEQUENCB CHARACTERISTICS: 

(A) LENGTH: 1226 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 



GAATTCGGCA 


CGAGGCAAGC 


CACCATCTTC 


CTTCGGCCT6 


CACCCCTTTA 


AAGGCACCCA 


60 


^GACCCCTdTG 


6AAAAAGATG AXCTGAAGCC^ 


CTTTGACATC CTCCAGCCTA AGGAGtAdTT ' 


120 


CCAGCTCAGC 


C6CGACACG6 


TCATTAAGAT 


GGGAAGT6AG 


AAC6AG6CCC 


TGGATCTCTC 


180 


CATGAAGTCA 


GTGCCCTGGC 


TCAAGGCTGG 


T6AAGTCAGT 


CCCCCAATCT 


TCCAGGAA6A 


240 


TGCAGCCCTA 


GACCTGTCAG 


TGGCAGCCCA 


CCGGAAATCC 


GAGCCTCCCC 


CTGAGACACT 


300 


GTATGACAGT 


GGTGCATCAG 


TGGACAGCTC 


AGGTCACACA 


6TGATGGAGA 


AACTTCCCAG 


360 


TGGCATGGAA 


ATTTCTTTTG 


CCCCTGCCAC 


GTCCCATGAG 


GCCCCAGCCA 


TGATGGATAG 


420 


TCACATCAGC 


AGCAGTGATG 


CTGCTACCGA 


GATGCTCAGC 


CAGCCCAACC 


ACCCCAGCGG 


480 


CGAAGTCAAG 


GCTGAAAATA 


ACATTGAGAT 


GGTGGGCGAG 


TCCCAGGCGG 


CCAAGGTCAT 


540 


TGTCTCTGTC 


GAAGATGCTG 


TGCCTACCAT 


ATTCTGTGGC 


AAGATCAAAG 


GCCTCTCAGG 


600 


GGTGTCCACC 


AAAAACTTCT 


CCTTCAAAAG 


A6AAGACTCC 


GTGCTTCAGG 


GCTATGACAT 


660 


CAACAGCCAA 


6GGGAAGA6T 


CCATGGGAAA 


TGCAGAGCCC 


CTTAGGAAAC 


CCATCAAAAA 


720 


CCGGAGCATA 


AAGTTAAA6A 


AAGTGAACTC 


CCAGGAAGTA 


CACATGCTCC 


CAATCAAAAA 


780 


ACAACGGCTG 


GCCACCTTTT 


TTCCAAGAAA 


GTAAATAACG 


GCTTTTTAAA 


ATTTGTATGA 


840 


TTATAATATG 


6GGAAAGGTG 


CATTGGTTTT 


ATAAAAAGGC 


ATTTAAAACA 


AATTATCTTT 


900 
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6TTAATTATT TT6G6GAGTA 6TT66GAAAT 66AAA6GT6A ATTGGCTCTA 6A6GCCCTGT 960 

ATGCTAGTAT CATTTTCTTT TTTAATTTTT GACTTTTCAC AAATGAGTAA ATAAGAGCAA 1020 

CCTATTTTTC AAGCAGATTG CACATTTTTT GCAGCTTTAA TGGAATATTG GGTGAATTAG 1080 

AGGGGTAAAA AAAGCTATTT TCATTGCCAC AAAGTGCTTT GATGATGTAA TACCTAATAA 1140 

AGGGTAGGAT GAATATTTCA CAATAIU^TGT TTGTTTGCAC TAAAAAIUUUV AAAAAAAAAA 1200 

AAAAAAAAAA AAATTCCTGC GGCCGC 1226 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 1049 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGy: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



GAATTCGGCA 


CGA6GGCGCC 


ATGGTGAAGG 


TGACGTTCAA 


CTCCGCTCTG 


GCCCAGAAGG 


60 


AGGCCAAGAA 


GGACGAGCCC 


AAGAGCGGCG 


AGGAGGCGCT 


CATCATCCCC 


CCCGACGCCG 


120 


TCGC6GTGGA 


CTGCAAGGAC 


CCAGATGATG 


TG6TACCAGT 


TGGCCAAAGA 


AGAGCCTGGT 


180 


GTTGGTGCAT 


6TGCTTTGGA 


CTAGCATTTA 


T6CTTGCAGG 


TGTTATTCTA 


GGAGGAGCAT 


240 


' ACTTGTACAA' 


ATATTTTGCA CTTCAACCAG ATGACGTGTA CTACTGTGGA ATAAAGTACA 


300 


TCAAAGATGA 


TGTCATCTTA 


AATGAGCCCT 


CTGCAGATGC 


CCCAGCTGCT 


CTCTACCAGA 


360 


CAATTGAAGA 


AAATATTAAA 


ATCTTTGAAG 


AAGAAGAAGT 


TGAATTTATC 


AGTGTGCCTG 


420 


TCCCAGAGTT 


TGCAGATAGT 


GATCCTGCCA 


ACATTGTTCA 


T6ACTTTAAC 


AAGAAACTTA 


480 


CAGCCTATTT 


AGATCTTAAC 


CTGGATAAGT 


GCTATGTGAT 


CCCTCTGAAC 


ACTTCCATTG 


540 


TTATGCCACC 


CAGAAACCTA 


CTGGAGTTAC 


TTATTAACAT 


CAAGGCTGGA 


ACCTATTTGC 


600 


CTCAGTCCTA 


TCTGATTCAT 


GAGCACATGG 


TTATTACTGA 


TCGCATTGAA 


AACATTGATC 


660 


ACCTGGGTTT 


CTTTATTTAT 


CGACTGTGTC 


ATGACAAGGA 


AACTTACAAA 


CTGCAACGCA 


720 


GAGAAACTAT 


TAAAGGTATT 


CAGAAACGTG 


AAGCCAGCAA 


TTGTTTCGCA 


ATTCGGCATT 


780 


TTGAAAACAA 


ATTTGCCGTG 


GAAACTTTAA 


TTTGTTCTTG 


AACAGTCAAG 


AAAAACATTA 


840 


TTGAGGAAAA 


TTAATATCAC 


AGCATAACCC 


CACCCTTTAC 


ATTTTGTTGC 


AGTTGATTAT 


900 


TTTTTAAAGT 


CTTCTTTCAT 


GTAAGTAGCA 


AACAGGGCTT 


TACTATCTTT 


TCATCTCATT 


960 


AATTCAATTA 


AAACCATTAC 


CTTAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


1020 


AAAAAAAAAA 


AAAAAATTCC 


TGCGGCCGC 








1049 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1142 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



GAATTCGGCA 


C6A6GGGAGA 


ATACTTTTTG 


CGATGCCTAC 


TGGAGACTTT 


GATTCGAAGC 


60 


CCAGTTGGGC 


CGACCAGGTG 


GAGGAGGAGG 


GGGAGGACGA 


CAAATGTGTC 


ACCAG06A6C 


120 


TCCTCAAGGG 


GATCCCTCTG 


GCCACAGGTG 


ACACCAGCCC 


AGAGCCAGAG 




1 AO 


GAGCTCCACT 


GCCGCCTCCC 


AAGGAGGTCA 


TCAACGGAAA 


CATAAA6ACA 


GTGACAGAGT 


240 


ACAAGATAGA 


TGAGGATGGC 


AAGAAGTTCA 


AGATTGTCCG 


CACCTTCAGG 


ATTGAGACCC 


300 


GGAAGGCTTC 


AAAGGCTGTC 


GCAAGGAGGA 


AGAACTGGAA 


GAAGTTCGGG 


AACTCAGAGT 


360 


TT6ACCCCCC 


C6GACCCAAT 


GTGGCCACCA 


CCACTGTCA6 


T6AC6ATGTC 


TCTATGACGT 


420 


TCATCACCAG 


CAAAGAGGAC 


CTGAACTGCC 


AGGAGGAGGA 


66ACCCTAT6 


AAC2UUITTCA 


480 


AGGGCCA6AA 


6ATC6T6TCC 


TGCCGCATCT 


GCAAGGGCGA 


CCACTGGACC 


ACCCGCTGCC 


540 


CCTACAAGGA 


TACGCT6GG6 


CCCATGCAGA 


AGGAGCTGGC 


CGAGCAGCT6 


GGCCT6TCTA 


600 


CTGGCGAGAA 


66A6AAGCTO 


CCGG6AGAGC 


TAGAGCCGGT 


6CAG6CCACG 


CA6AACAAGA 


660 


^ CAGG6AA6TA 


T6TGCCGCCG 


AGCCTGCGCG 


ACGGGGCCAG CCGCCGCGG6 


:'GAGTCCAT6C 


..720 


A6CCCAACCG 


CAGA6CCGAC 


GACAACGCCA 


CCATCCGTGT 


CACCAACTT6 


CGCA6AGGAC 


780 


ACGCGTGAGA 


CC6ACCTGCA 


G6AGCTCTTC 


CGGCCTTTC6 


6CTCCATCTC 


C06CATCTAC 


840 


CTGGCTAAGG 


ACAAGACCAC 


TGGCCAATCC 


AAGGGCTTTG 


CCTTCATCAG 


CTTCCACCGC 


900 


CGCGAGGATG 


CTGCGCGTGC 


CATTGCCGGG 


GTGTCCGGCT 


TTGGCTACGA 


CCACCTCATC 


960 


CTCAACGTCG 


AGTGGGCCAA 


GCCGTCCACC 


AACTAAGCCA 


GCTGCCACTG 


TGTACTCGGT 


1020 


CCGGGACCCT 


TGGCGACAGA 


AGACAGCCTC 


CGAGAGCGCG 


GGCTCCAAGG 


GCAATAAAGC 


1080 


AGCTCCACTC 


TCAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


Al^AAAAAAAT 


TCCTGCGGCC 


1140 



GC 1142 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1696 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTIONS SEQ ID NOsl2: 





CGAGCGAAAC 


ATG6CGGTA6 


GCTGGGACCA 


TAACACAAGC 


ATGACTATAT 


60 






CT6AAGAT6A 


GGCGACTGAA 


TCGGAAAAAA 


ACTTTAAGTT 


120 






TTTCCGAAOG 

AAA \^\*^^^»mml^m^m 


TTCCT6A6A6 


CTATGTAGAG 


ACTTCAGCCA 


180 


OTGGAG6TAC 


AGTTTCTCTA 


ATA6CATTTA 


CAACTATGGC 


TTTATTAACC 


ATAATG6AAT 


240 


TCTCAGTATA 


TCAAGATACA 


TGGAT6AAGT 


ATGAATACGA 


A6TAGACAAG 


GATTTTTCTA 


300 


GCAAATTAAG 


AATTAATATA 


GATATTACTG 


TTGCCATGAA 


GTGTCAATAT 


GTTGGAGCGG 


360 


ATGTATTGGA 


TTTAGCAGAA 


ACAATGGTTG 


CATCTGCAGA 


TGGTTTAGTT 


TATGAACCAA 


420 


CAGTATTTGA 


TCTTTCACCA 


CAGCAGAAAG 


AGTGGCAGAG 


GATGCTGCAG 


CTGATTCAGA 


480 


GTAGGCTACA 


AGAAGAGCAT 


TCACTTCAAG 


ATGTGATATT 


TAAAAGTGCT 


TTTAAAAGTA 


540 


CATCAACAGC 


TCTTCCACCA 


AGAGAAGATG 


ATTCATCACA 


GTCTCCAAAT 


GCAT6CAGAA 


600 


TTCATGGCCA 


TCTATATGTC 


AATAAA6TAG 


CAGGGAATTT 


TCACATAACA 


GTGGGCAAGG 


660 


CAATTCCACA 


TCCTCGTGGT 


CATGCACATT 


TGGCAGCACT 


T6TCAACCAT 


GAATCTTACA 


720 


ATTTTTCTCA 


TAGAATAGAT 


CATTTGTCTT 


TTGGAGAGCT 


TGTTCCAGCA 


ATTATTAATC 


780 


CTTTAGATGG 


AACTGAAAAA 


ATTGCTATAG 


ATCACAACCA 


GAT6TTCCAA 


TATTTTATTA 


840 


CAGTTGTGCC 


AACAAAACTA 


CATACATATA 


AAATATCAGC 


AGACACCCAT 


CA6TTTTCTG 


900 


- TGACA6AAAG GGAACGTATC 


ATTAACCATG 


CTGCAGGCAG CCATGGAGTC TCTGGGATAT 


9«o; 


TTATGAAATA 


TGATCTCAGT 


TCTCTTATGG 


TGACAGTTAC 


TGAGGAGCAC 


ATGCCATTCT 


1020 


GGCAGTTTTT 


TGTAAGACTC 


TGTGGTATTG 


TTGGAGGAAT 


CTTTTCAACA 


ACAGGCATGT 


1080 


TACATGGAAT 


TGGAAAATTT 


ATAGTTGAAA 


TAATTTGCTG 


TCGTTTCAGA 


CTTGGATCCT 


1140 


ATAAACCTGT 


CAATTCTGTT 


CCTTTTGAGG 


ATGGCCACAC 


AGACAACCAC 


TTACCTCTTT 


1200 


TAGAAAATAA 


TACACATTAA 


CACCTCCCGA 


TTGAAGGAGA 


AAAACTTTTT 


GCCTGAGACA 


1260 


TAAAACCTTT 


TTTTAATAAT 


2UUVATATTGT 


GCAATATATT 


CAAA6AAAAG 


AAAACACAAA 


1320 


TAAGCAGAAA 


ACATACTTAT 


TTTAAAAAAG 


AAAAAAAAGG 


ATAiiAAAAAC 


CCAAACTGAA 


1380 


ATTCTATATA 


CGTTGTGTCT 


GTTACAAATG 


TCGTAGAAGA 


AAXCATGCAG 


CTAAACGATG 


1440 


AAGAAGCCCA 


ACTGGAGTGT 


TGCTTTGAAG 


ATGACGCCTT 


CTTATATTTT 


CATAGCAAAT 


1500 


GGGTGGTATC 


AAAATCAGAC 


ATTGCTTCTT 


gctgataaaa 


AGCCTGAAGG 


AAATAAGTGA 


1560 


AACTACATCT 


ATGGGAAAAA 


AAAAAACATT 


GAGAAGTGCA 


AATGTTCGCA 


TCCTTTTGTT 


1620 


TTTAA2VAGAT 


ATGATGTCAG 


AATAAAATGT 


GGAAAACATA 


CGGAAAAAAA 


AAAAAAAAAA 


1680 


AAATTCCTGC 


GGCCGC 










1696 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



GAATTC6GCA 


CGAG6CGGCA 


OGAGGCGGCA 


OGAGG6TGGC 


ATATCACGGC 


CATGGGGTCT 


60 


CAGCATTCCG 


CTGCTGCTCG 


CCCCTCCTCC 


TGCAGGCGAA 


AGCAAGAAGA 


TGACAGGGAC 


120 


GGTTTGCTGG 


CTGAACGA6A 


GCAGGAAGAA 


GCCATTGCTC 


AGTTCCCATA 


TGTGGAATTC 


180 


ACCGGGAGA6 


ATAGCATCAC 


CTGTCTCACG 


TGCCAGGGGA 


CAGGCTACAT 


TCCAACAGAG 


240 


CAA6TAAATG 


AGTTGGTGGC 


TTTGATCCCA 


CACAGTGATC 


AGAGATTGCG 


CCCTCAGC6A 


300 


ACTAA6CAAT 


ATGTCCTCCT 


GTCCATCCTG 


CTTTGTCTCC 


TGGCATCTGG 


TTTGGTGGTT 


360 


TTCTTCCTGT 


TTCCGCATTC 


AGTCCTTGTO 


GATGATGACG 


GCATCAAAGT 


GGT6AAAGTC 


420 


ACATTTAATA 


AGCAAGACTC 


CCTTGTAATT 


CTCACCATCA 


TGGCCACCCT 


GAAAATCAGG 


480 


AACTCCAACT 


TCTACACGGT 


GGCAGTGACC 


AGCCTGTCCA 


GCCAGATTCA 


GTACAT6AAC 


540 


AGAGTG6TCA 


6TACATATGT 


GACTACTAAC 


6TCTCCCTTA 


TTCCACCTCG 


6A6TGAGCAA 


600 


CTG6TGAATT 


TTACCGGGAA 


GGCCGA6ATG 


G6AGGACCGT 


TTTCCTATGT 


6TACTTCTTC 


660 


TGCAdSCTAC CTGAGATCCT 


G6TGCACAAC 


ATAGTGATCT TCATGC6AAC ^TTCAGTGAAG 


- 720 


ATTTCATACA 


TTGGCCTCAT 


GACCCAGAGC 


TCCTTGGAGA 


CACATCACTA 


TGTGGATTGT 


780 


6GAGGAAATT 


CCACAGCTAT 


TTAACAACT6 


CTATTGGTTC 


TTCCACACA6 


CGCCTGTAGA 


840 


AGA6A6CACA 


GCATATGTTC 


CCAAGGCCTG 


AGTTCTGGAC 


CTACCCCCAC 


GTGGTGTAAG 


900 


CAGAGGAGGA 


ATTGGTTCAC 


TTAACTCCCA 


GCAAACATCC 


TCCTGCCACT 


TAGGAGGAAA 


960 


CACCTCCCTA 


TGGTACCATT 


TATGTTTCTC 


AGAACCAGCA 


GAATCAGTGC 


CTAGCCTGTG 


1020 


CCCAGCAAAT 


AGTTGGCACT 


CAATAAAGAT 


TTGCAGAATT 


TAAAAAAAAA 


AAAAAAAAAA 


1080 


AAAAAAATTC 


CTGCGGCCGC 










1100 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1588 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GAATTCGGCA CGAG6GTACC TGCTTTTCTA TTGCCTCTTT 6AAACAATGG TCACGTGTTT 60 

CCATGTTCCC TACTCGGCTC TCACCATGTT CATCAGCACC GAGCAGACTG AGCGGGATTC 120 

TGCCACCGCC TATCGGATGA CTGTGGAAGT GCTGGGCACA GTGCTGGGCA CGGCGATCCA 180 

GGGACAAATC GTGGGCCAAG CAGACACGCC TTGTTTCCAG 6ACCTCAATA GCTCTACAGT 240 

AGCTTCACAA AGTGCCAACC ATACACATGG CACCACCTCA CACAGGGAAA CGCAA2UIGGC 300 

ATACCTGCTG GCAGCGGGGG TCATTGTCTO TATCTATATA ATCTGTGCTG TCATCCTOAT 360 

CCTGGGCGTG CGGGAGCAGA GA6AACCCTA TGAAGCCCAG CAGTCTGAGC CAATCGCCTA 420 

CTTCCGGGGC CTACGGCTGG TCATGAGCCA CGGCCCATAC ATCAAACTTA TTACTGGCTT 480 

CCTCTTCACC TCCTTGGCTT TCATGCTGGT GGAGGGGAAC TTTGTCTTGT TTTGCACCTA 540 

CACCTTGGGC TTCCGCAATG AATTCCAGAA TCTACTCCTG GCCATCATGC TCTCGGCCAC 600 

TTTAACCATT CCCATCTGGC AGTGGTTCTT GACCCGGTTT GGCAAGAAGA CAGCTGTATA 660 

TGTTGGGATC TCATCAGCAG TGCCATTTCT CATCTTGGTG GCCCTCATGG AGAGTAACCT 720 

CATCATTACA TATGC6GTAG CTGTGGCAGC TGGCATCAGT 6TGGCAGCTG CCTTCTTACT 780 

ACCCTGGTCC ATGCTGCCTG ATGTCATTGA CGACTTCCAT CT6AAGCAGC CCCACTTCCA 840 

TG6AACCGA6 CCCATCTTCT TCTCCTTCTA TGTCTTCTTC ACCAAGTTT6 CCTCTG6AGT 900 

GTCACTGGGC ATTTCTACCC TCAGTCTGGA CTTTGCAGGG TACCAGACCC 6TGGCTGCTC 960 
^ GCAGCCGGAA^ CGTGTCKAGT TTACACTGAA ''CAT6CTCGTG ACGATG6CTC CCATAGTTCT^'^ -iOStJ-^^ 

CATCCTGCTG GGCCTGCTGC TCTTCAAAAT GTACCCCATT GATGAGGAGA GGCGGCGGCA 1080 

GAATAAGAAG GCCCTGCAGG CACTGAGGGA CGAGGCCAGC AGCTCTGGCT GCTCAGAAAC 1140 

AGACTCCACA GAGCTGGCTA GCATCCTCTA GGGCCCGCCA CGTTGCCCGA AGCCACCATG 1200 

CAGAAGGCCA CAGAAGGGAT CAGGACCTGT CTGCCGGCTT GCTGAGCAGC TGGACTGCAG 1260 

GTGCTAGGAA GGGAACTGAA GACTCAAGGA GGTGGCCCAG GACACTTGCT GTGCTCACTG 1320 

TGGGGCCGGC TGCTCTGTGG CCTCCTGCCT CCCCTCTGCC TGCCTGTGGG GCCAAGCCCT 1380 

GGGGCTGCCA CTGTGAATAT GCCAAGGACT GATCGGGCCT AGCCCGGAAC ACTAATGTAG 1440 

AAACCTTTTT TTTACAGAGC CTAATTAATA ACTTAATGAC TGTGTACATA GCAATGT6TG 1500 

TGTATGTATA TGTCTGTGAG CTATTAATGT TATTAATTTT CATAAAAGCT GGAAAGCAAA 1560 

AAAAAAAAAA AAA7U1TTCCT GCGGCCGC 1588 

(2) INFORMATION FOR SEQ ID NO: 15: 
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(i) SEQUKNCE CHARACTERISTICS: 

(A) LENGTH: 1535 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



GAATTCGGCA 


CGAGGCGGAA 


GTCCCGTCTC 


ACGGTTGCCC 


TGGCAGCGCG 


CGAGGCTGGT 


60 


GAGTCGGCAG 


CCCTGTGGCA 


GCCGGCGGGC 


TGGTTTCCAT 


GGTTGCAOGA 


TTAGGAACCA 


120 


CCAGCTGCTG 


CATCCCATGG 


CCAGGGGTGG 


CGTCCAGGTG 


GCAGAGCAGC 


TAGGAACGCA 


180 


AGGCCT6AAC 


CTGCGGCCAG 


ACACCCTGCT 


CTCCCGGCCA 


TGGTCAACGA 


CCCTCCAGTA 


240 


CCTGCCTTAC 


TGTGGGCCCA 


GGAGGTGGGC 


CAAGTCTTGG 


CAGGCCGTGC 


CCGCAGGCTG 


300 


CTGCTGCAGT 


TTGGGGTGCT 


CTTCTGCACC 


ATCCTCCTTT 


TGCTCTGGGT 


GTCTGTCTTC 


360 


CTCTATGGCT 


CCTTCTACTA 


TTCCTATATG 


CCGACAGTCA 


GCCACCTCAG 


CCCTGTGCAT 


420 


TTCTACTACA 


GGACCGACTG 


TGATTCCTCC 


ACCACCTCAC 


TCTGCTCCTT 


CCCTGTTGCC 


480 


AAT6TCTCGC 


TGACTAAGGG 


TGGACGTGAT 


CGGGT6CTGA 


TGTAT6GACA 


GCCGTATCGT 


540 


GTTACCTTA6 


AGCTTGAGCT 


GCCAGAGTCC 


CCTGTGAATC 


AAGATTTGGG 


CATGTTCTT6 


600 


GTCACCATTT 


CCTGCTACAC 


CAGAGGTGGC 


CGAATCATCT 


CCACTTCTTC 


GCGTTC6GTG 


660 


ATGCTGCATT 


ACCGCTCAGA 


CCTGCTCCAG 


ATGCTGGACA 


CACTG6TCTT 


CTCTAGCCTC 


720 


CTGCTATTTG 


GCTTTGCAGA 


GCAGAAGCAG 


CTGCTGGAGG 


TGGAACTCTA 


CGCA6ACTAT 


780 


AGAgAGAACT CGTACGTGCC 6AGCACT6GA 


GCGATCATTG AGATCCACAG^CAAGOGCATC 


-840- 


CAGCTGTATG 


GAGCCTACCT 


CCGCATCCAC 


GCGCACTTCA 


CTGGGCTCA6 


ATACCTGCTA 


900 


TACAACTTCC 


CGATGACCTG 


CGCCTTCATA 


GGTGTTGCCA 


GCAACTTCAC 


CTTCCTCAGC 


960 


GTCATCGTGC 


TCTTCAGCTA 


CATGCAGTGG 


GTGTGGGGGG 


GCATCTGGCC 


CCGACACCGC 


1020 


TTCTCTTTGC 


AGGTTAACAT 


CCGAAAAAGA 


GACAATTCCC 


GGAAGGAAGT 


CCAACGAAGG 


1080 


ATCTCTGCTC 


ATCA6CCAGG 


GCCTGAAGGC 


CAGGAGGAGT 


CAACTCCGCA 


ATCAGATGTT 


1140 


ACAGAGGATG 


GTGAGAGCCC 


TGAAGATCCC 


TCAGGGACAG 


AGGTCAGCTG 


TCCGAGGAGG 


1200 


AGAAACCAGA 


TCAGCAGCCC 


CTGAGCGGAG 


AAGAGGAGCT 


AGAGCCTGAG 


GCCAGTGATG 


1260 


6TTCAGGCTC 


CTGGGAAGAT 


GCAGCTTTGC 


TGACGGAGGC 


CAACCTGCCT 


GCTCCTGCTC 


1320 


CTGCTTCTGC 


TTCTGCCCCT 


GTCCTAGAGA 


CTCTGGGCAG 


CTCTGAACCT 


GCTGGGGGTG 


1380 


CTCTC06ACA 


GCGCCCCACC 


TGCTCTAGTT 


CCTGAAGAAA 


AGGGGCAGAC 


TCCTCACATT 


1440 


CCAGCACTTT 


CCCACCTGAC 


TCCTCTCCCC 


TCGTTTTTCC 


TTCAATAAAC 


TATTTTGTGT 


1500 


CAAAAAAAAA 


AAAAAAAAAA 


AATTCCTGCG 


GCCGC 






1535 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS S 

(A) LENGTH: 1322 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GAATTCGGCA CGAGGGCGGG CGCTACGGGC TTGACTCCCC CAAGGCCGAG GTCCGCGGCC 
AGGTGCTGGC GCCGCTGCCC CTCCACGGAG TTGCTGATCA TCTGGGCTGT GATCCACAAA 
CCCGGTTCTT TGTCCCTCCT AATATCAAAC A6TGGATTGC CTTGCT6CAG AGGG6AAACT 
GCACGTTTAA AGAGAAAATA TCACGGGCCG CTTTCCACAA TGCAGTT6CT 6TAGTCATCT 
ACAATAATAA ATCCAAAGAG 6AGCCAGTTA CCATGACTCA TCCAGGCACT GGA6ATATTA 
TTGCTGTCAT GATAACAGAA TTGAGGGGTA AG6ATATTTT GA6TTATCT6 GAGAAAAACA 
TCTCTGTACA AATGACAATA GCTGTTGGAA CTCGAATGCC ACCGAAGAAC TTCAQCC6TG 
GCTCTCTAGT CTTCGTGTCA ATATCCTTTA TTGTTTTGAT GATTATTTCT TCAGCATGGC 
TCATATTCTA CTTCATTCAA AAGATCAGGT ACACAAATGC ACGCGACAGG AACCAGCGTC 
GTCTCGGAGA TGCAGCCAAG AAAGCCATCA GTAAATTGAC AACCAGGACA GTAAAGAAGG 
GTGACAAGGA AACTGACCCA GACTTTGATC ATTGTGCAGT CTGCATAGAG AGCTATAAGC 
AGAATGATGT CGTCCGAATT CTCCCCTGCA - AGCATGTTTT CCAQWU^TCC TO ; 
CCTGGCTTAG TGAACATTGT ACCTGTCCTA TGTGCAAACT TAATATATTG AAGGCCCTGG 
GAATTGTGCC GAATTTGCCA TGTACTGATA ACGTAGCATT CGATATGGAA AGGCTCACCA 
GAACCCAAGC TGTTAACCGA AGATCAGCCC TCGGCGACCT CGCCGGCGAC AACTCCCTTG 
GCCTTGAGCC ACTTCGAACT TCGGGGATCT CACCTCTTCC TCAGGATGGG GA6CTCACTC 
CGAGAACAGG AGAAATCAAC ATTGCAGTAA CAAAAGAATG GTTTATTATT GCCAGTTTTG 
GCCTCCTCAG TGCCCTCACA CTCTGCTACA TGATCATCAG AGCCACAGCT AGCTT6AATG 
CTAATGAGGT AGAATGGTTT TGAAGAAGAA AAAACCTGCT TTCTGACTGA TTTTGCCTTG 
AAGGAAAAAA GAACCTATTT TTGTGCATCA TTTACCAATC ATGCCACACA AGCATTTATT 
TTTAGTACAT TTTATTTTTT CATAAAATTG CTAATGCCAA AGCTTTGTAT TAAAAGAAAT 
AAATAATAAA ATAAAAAAAA AAAAAAAAAA AAAAAAAA2VA AAAAAAAAAT TCCTGCGGCC 
GC 

(2) INFORMATION FOR SEQ ID NO: 17: 
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60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



GAATTCGGCA 


CGAGGCCCTC 


CCGC6CTCCC 


GGGG06C6CG 


GGCCG06CCC 


CCGACGCCCT 


60 


ACATATACTC 


AGGTGCGCCC 


CACCTGTCC6 


CCCGCACCT6 


CT6GCTCACC 


TCCGAGCCAC 


120 


CTCTGCTGCG 


CACCGCAGCC 


TCGGACCTAC 


AGCCCAGGAT 


ACTTTGGGAC 


TTGCOGGC6C 


180 


TCAGAAACGC 


GCCCAGACGG 


CCCCTCCACC 


TTTTGTTTGC 


CTAGGGTCGC 


C6AGAGOGCC 


240 


CGGAGGGAAC 


CGCCTGGCCT 


TCGGGGACCA 


CCAATTTTGT 


CTGGAACCAC 


CCTCCCGGCG 


300 


TATCCTACTC 


CCTGTGCCGC 


GAGGCCATCG 


CTTCACTGGA 


GGGGTCGATT 


TGTGTGTAGT 


360 


TT6GTGACAA 


GATTTGCATT 


CACCTGGCCC 


AAACCCTTTT 


TGTCTCTTTG 


GGTGACCGGA 


420 


AAACTCCACC 


TCAAGTTTTC 


TTTTGTGGGG 


CTGCCCCCCA 


AGTGTCGTTT 


GTTTTACTGT 


480 


AGGGTCTCCC 


6CCC6GCGCC 


CCGAGTGTTT 


TCTGAGGGCG 


GAAATGGCCA 


ATTCGGGCCT 


540 


GCAGTT6CTG 


GGCTTCTCCA 


TGGCCCTGCT 


GGGCTGGGT6 


GGTCTG6TGG 


CCTGCACC6C 


600 


CATCCC6CAG 


TGGCAGAT6A 


GCTCCTATGC 


G6GTGACAAC 


ATCATCACG6 


CCCAG6CCAT 


660 


GTAGAAGGGG 


CT6TGGAT6G 


ACT6CGTCAC 


GCAGAGCACG 


GG6ATGATGA 


GCTGCAAAAT 


720 


6TAOGACTCG 


6TGCTCGCCC 


TGTCC6CG6C 


CTTGCAGGCC 


ACTC6AGCCC 


TAAT6GT66T 


780 


CTCCCTGGTG CTGGGCTTCC^ T6GCCATGTT T6TGGCCACG 


ATGGGCATGA^ 


ACTGCACGCG"^ 


840 


CT6TG666GA 


GACGACAAA6 


TGAA6AAGGC 


CCGTATAGCC 


AT6G6T66AG 


GCATAATTTT 


900 


CATC6T66CA 


66TCTTGCCG 


CCTT6GTA6C 


TTGCTCCTG6 


TATG6CCATC 


AGATT6TCAC 


960 


AGACTTTTAT 


AACCCTTTGA 


TCCCTACCAA 


CATTAAGTAT 


GAGTTTGGCC 


CTGCCATCTT 


1020 


TATTGGCTGG 


GCAGGGTCTG 


CCCTAGTCAT 


CCTGGGAGGT 


GCACTGCTCT 


CCTGTTCCTG 


1080 


TCCTGGGAAT 


GAGAGCAAGG 


CTGGGTACCG 


TGCACCCCGC 


TCTTACCCTA 


AGTCCAACTC 


1140 


TTCCAAGGAG 


TATGTGTGAC 


CTGGGATCTC 


CTTGCCCCAG 


CCTGACAGGC 


TATGGGAGTG 


1200 


TCTAGATGCC 


TGAAAGGGCC 


TGGGGCTGAG 


CTCAGCCTGT 


GGGCAGGGTG 


CCGGACAAAG 


1260 


GCCTCCTGGT 


CACTCTGTCC 


CTGCACTCCA 


TGTATAGTCC 


TCTTGGGTTG 


GGGGTGGGGG 


1320 


GGTGCCGTTG 


GTGGGAGAGA 


CAAAAAGAGG 


GAGAGTGTGC 


TTTTTGTACA 


GTAATAAAAA 


1380 


ATAAGTATTG 


GGAAGCAGGC 


TTTTTTCCCT 


TCAGGGCCTC 


TGCTTTCCTC CCGTCCAGAT 


1440 


CCTTGCAGGG 


AGCTT66AAC 


CTTAGTGCAC 


CTACTTCAGT 


TCA6AAGACT 


TAGCACCCCA 


1500 


CTGACTCCAC 


TGACAATT6A 


CTAAAAGATG 


CAGGTGCTCG 


TATCTCGACA 


TTCATTCCCA 


1560 


CCCCCCTCTT 


ATTTAAATAG 


CTACCAAAGT 


ACTTCTTTTT 


TAATAAAAAA 


ATAAAGATTT 


1620 
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TTATTAG6TA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA2UV 1680 



(2) INFORMATION FOR SEQ ID NOilSt 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1553 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



GAATTCGGCA 


CGAGGGCAGG 


TCCAGAGTAA 


AGTCACTGAA 


GAGTGG2VAGC 


6AGGAAGGAA 


60 


CA6GATGATT 


AGACCTCA6C 


T6CG6AC06C 


GGGGCTGGGA 


CGAT6CCTCC 


T6CCGGGGCT 


120 


GCTGCT6CTC 


CTGGTGCCCG 


TCCTCTGGGC 


CGGGGCT6AA 


AA6CTACATA 


CCCAGCCCTC 


180 


CTGCCCCGCG 


GTCTGCCAGC 


CCACGCGCT6 


CCCCGCGCTG 


CCCACCTGCG 


CGCTGGGGAC 


240 


CACGCCGGTG 


TTCGACCTGT 


GCCGCTGTTG 


CCGCGTCTGC 


CCC6CGGCC6 


AGC6T6AAGT 


300 


CTGCGGCGG6 


GCGCAGGGCC 


AACC6TGOGC 


CCCGGGGCTG 


CAGTGCCTCC 


AGCCGCTGCG 


360 


CCCCGG6TTC 


CCCAGCACCT 


GCGGTTGCCC 


GACGCTGGGA 


GGGGCCGTGT 


GCGGCAGCGA 


420 


CAGGCGCACC 


TACCCCAGCA 


TGTGCGCGCT 


CCGGGCCGAA 


AACCGCGCCG 


OGCGCCGCCT 


480 


GGGCAAGGTC 


CCGGCCGTGC 


CTGTGCAGTG 


GGGjGAACTGC 


GGGQATACAG 


GGACCAGAAG 


540 


CGCAGGCCCG 


CTCAGGAGGA 


ATTACAACTT 


CATCGCCGCG 


GTGGTGGAGA 


AGGTGGCGCC 


600 


ATCGGTGGTT 


CACGTGCAGC 


TGTGGGGCAG 


GTTACTTCAC 


GGCAGCAGGC 


TTGTTCCTGT 


660 


GTACAGTGGC 


TCTGGGTTCA 


TAGTGTCTGA 


GGACGGGCTC 


ATTATTACCA 


ATGCCCATGT 


720 


TGTCAGGAAC 


CAGCAGTGOA 


TTGAGGTGGT 


GCTCCAGAAT 


GGGGCCCGTT 


ATGAAGCTGT 


780 


TGTCAAGGAT 


ATTGACCTTA 


AATTGGATCT 


TGCGGTGATT 


AAGATTGAAT 


CAAATGCTGA 


840 


ACTTCCTGTA 


CTGATGCTGG 


GAAGATCATC 


TGACCTTCGG 


GCTGGAGAGT 


TTGTGGTGGC 


900 


TTT6GGCAGC 


CCATTTTCTC 


TGCAGAACAC 


AGCTACTGCA 


GGAATTGTCA 


GCACCAAACA 


960 


GCGAGGGGGC 


AAAGAACTG6 


GGATGAAGGA 


TTCAGATATG 


GACTACGTCC 


AGATTGATGC 


1020 


CACAATTAAC 


TATGGGAATT 


CTGGTGGTCC 


TCTGGTGAAC 


TTGGATGGTG 


ATGTGATTGG 


1080 


CGTCAATTCA 


TTGAGGGTGA 


CTGATGGAAT 


CTCCTTTGCA 


ATTCCTTCAG 


ATCGAGTTAG 


1140 


GCAGTTCTTG 


GCAGAATACC 


ATGAGCACCA 


GATGAAAGGA 


AAGGCGTTTT 


CAAATAA6AA 


1200 


ATATCTGGGT 


CTGCAAATGC 


TGTCCCTCAC 


TGTGCCCCTT 


AGTGAAGAAT 


TGAAAATGCA 


1260 


TTATCCAGAT 


TTCCCTGATG 


TGAGTTCTGG 


GGTTTATGTA 


TGTAAAGTGG 


TTGAAGGAAC 


1320 



AAAAAAAAAA AAAAAAAATT CCTGCGGCC6 C 



1711 
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AGCTGCTCAA AGCTCTGGAT TGAGAGATCA CGATGTAATT GTCAACATAA ATGGGAAACC 
TATTACTACT ACAACTGATG TTGTTAAAGC TCTTGACAGT GATTCCCTTT CCATGGCTGT 
TCTTCGGGGA AAAGATAATT TGCTCCTGAC AGTCATACCT GAAACAATCA ATTAAATATC 
TTGTTTTAAA GTGGGATTAT CTAAAAAAAA AAAAAAAAAA TTCCTGCGGC CGC 

(2) INFORMATION FOR SEQ ID NO: 19: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1596 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GAATTCGGCA CGAGGGGAGC CGCTCCC6GA GCCCGGCCGT AGA6GCT6CA ATC6CAGCCG 
GGAGCCC6CA GCCCGC6CCC CGAGCCCGCC GCCGCCCTTC 6AGGGCGCCC CAGGCCGC6C 
CATG6T6AAG GTGAC6TTCA ACTCC6CTCT G6CCCAGAAG 6AGGCCAAGA AG6ACGA6CC 
CGAGAGC66C 6AGGAG6CGC TCATCATCCC CCCC6AC6CC GTC6CGGTG6 ACTGCAAGGA 
CCCAGATGAT GTGGTACCAG TTGGCCAAA6 AAGAGCCTGG TGTTGGTGCA TGTGCTTTG6 
' . ""^^ ACTAGCATTT "ATGCTTGCAG GTGTTATTCT AG^ TACTTGTACA *ATATTTTGC ^ 

ACTTCAACCA GATGACGTGT ACTACTGTGG AATAAAGTAC ATCAAAGATG ATGTCATCTT 
AAATGAGCCC TCTGCAGATG CCCCAGCTGC TCTCTACCAG ACAATTGAAG AAAATATTAA 
AATCTTTGAA GAAGAAGAAG TTGAATTTAT CAGTGTGCCT GTCCCAGAGT TTGCAGATAG 
TGATCCTGCC AACATTGTTC ATGACTTTAA CAAGAAACTT ACAGCCTATT TAGATCTTAA 
CCTGGATAAG TGCTATGTGA TCCCTCTGAA CACTTCCATT GTTATGCCAC CCAGAAACCT 
ACTGGAGTTA CTTATTAACA TCAAGGCTGG AACCTATTTG CCTCAGTCCT ATCTGATTCA 
TGAGCACATG GTTATTACTG ATCGCATTGA AAACATTGAT CACCTGGGTT TCTTTATTTA 
TCGACTGTGT CATGACAAGG AAACTTACAA ACTGCAACGC A6AGAAACTA TTAAAGGTAT 
TCAGAAACGT GAAGCCAGCA ATTGTTTCGC AATTCGGCAT TTTGAAAACA AATTTGCOGT 
GGIUU^CTTTA ATTTGTTCTT GAACAGTCAA GAAAAACATT ATTGAGGAAA ATTAATATCA 
CA6CATAACC CCACCCTTTA CATTTTGTGC AGTGATATTT TTTAAAGTCT CTTTCATGTA 
AGTAGCAAAC AGGGCTTTAC TATCTTTTCA TCTCATTAAT TCAATTAAAA CCATTACCTT 
AAAATTTTTT TCTTTCGAAG TGTGGTGTCT TTTATATTTG AATTAGTAAC TGTATGAA6T 



1380 
1440 
1500 
1553 



60 
120 
180 
240 
300 

420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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CATAGATAAT AGTACATGTC ACCTTAGGTA GTAGGAAGAA TTACAATTTC TTTAAATCAT 1200 

TTATCTGGAT TTTTATGTTT TATTAGCATT TTCAAGAAGA CGGATTATCT AGAGAATAAT 1260 

CATATATATG CATACGTAAA AATGGACCAC AGTGACTTAT TTGTAGTTGT TAGTTGCCCT 1320 

GCTACCTAGT TTGTTAGTGC ATTTGAGCAC ACATTTTAAT TTTCCTCTAA TTAAAATGTG 1380 

GAGTATTTTC AGTGTCAAAT ATATTTAACT ATTTAGAGAA TGATTTCCAC CTTTATGTTT 1440 

TAATATCCTA GGCATCTGCT GTAATAATAT TTTAGAAAAT GTTTG6AATT TAAGAAATAA 1500 

CTTGTGTTAC TAATTTGTAT AACCCATATC TGTGCAATGG AATATAAATA TCACAAAGTT 1560 

GTTTAAAAAA AAAAAAAAAA AAATTCCTGC GGCCGC 1596 



(2) INFORMATION FOR SEQ ID NO: 20s 



(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



Met Ala Trp Arg Arg Arg Glu Ala 

1 5 
Ala Leu Ala Leu Leu Ala Leu Ala 
20 

Arg Ala Leu Glu Trp Phe Ser Ala 

35 40 
Pro Gin Thr Asn Leu Thr Val Trp 

50 55 
Gly Asp Ser Ser Pro Lys Glu Gly 
65 70 
Trp Ala Pro Gly Gly Asp Leu Glu 
85 

Phe Val Pro Glu Pro Gly Gly Arg 
100 

Val Ala Arg Gly Gly Cys Thr Phe 



Gly Val Gly Ala T^g Gly Val Leu 

10 15 
Leu Cys Val Pro Gly Ala Arg Gly 
25 30 
Val Val Asn lie Glu Tyr Val Asp 
45 

Ser Val Ser Glu Ser Gly Arg Phe 

60 

Ala His Gly Leu Val Gly Val Pro 

75 80 
Gly Cys Ala Pro Asp Thr Arg Phe 

90 95 
Gly Ala Ala Pro Trp Val Ala Leu 
105 110 
Lys Asp Lys Val Leu Val Ala Ala 
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115 120 125 

Arg Arg Asn Ala Ser Ala Val Val Leu Tyr Asn Glu Glu Arg Tyr Gly 

130 135 140 

Asn lie Thr Leu Pro Met Ser His Ala Gly Thr Gly Asn He Val Val 
145 150 155 160 

He Mel: He Ser Tyr Pro Lys Gly Arg Glu He Leu Glu Leu Val Gin 

165 170 175 

Lys Gly He Pro Val Thr Met Thr He Gly Val Gly Thr Arg His Val 

180 185 190 

Gin Glu Phe He Ser Gly Gin Ser Val Val Phe Val Ala He Ala Phe 

195 200 205 

He Thr Met Met He He Ser Leu Ala Trp Leu He Phe Tyr Tyr He 

210 215 220 

Gin Arg Phe Leu Tyr Thr Gly Ser Gin He Gly Ser Gin Ser His Arg 
225 230 235 240 

Lys Glu Thr Lys Lys Val He Gly Gin Leu Leu Leu His Thr Val Lys 

245 250 255 

His Gly Glu Lys Gly He Asp Val Asp Ala Glu Asn Cys Ala Val Cys 

260 265 270 

He Glu Asn Phe Lys Val Lys Asp He He Arg He Leu Pro Cys Lys 

275 280 285 

His He Phe His Arg He Cys He Asp Pro Trp Leu Leu Asp His Arg 

Thr Cys Pro Met Cys Lys Leu Asp Val He Lys Ala Leu Gly Tyr Trp 
305 310 315 320 

Gly Glu Pro Gly Asp Val Gin Glu Met Pro Ala Pro Glu Ser Pro Pro 

325 330 335 

Gly Arg Asp Pro Ala Ala Asn Leu Ser Leu Ala Leu Pro Asp Asp Asp 

340 345 350 

Gly Ser Asp Asp Ser Ser Pro Pro Ser Ala Ser Pro Ala Glu Ser Glu 

355 360 365 

Pro Gin Cys Asp Pro Ser Phe Lys Gly Asp Ala Gly Glu Asn Thr Ala 

370 375 3eQ 

Leu Leu Glu Ala Gly Arg Ser Asp Ser Arg His Gly Gly Pro He Ser 
385 390 395 400 
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(2) INFORMATION FOR SEQ ID NOs21s 

(1) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 291 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESSs single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Met Asp Lys Gly Ser Ala 61y His Pro Gly Gly Val Leu Val Trp Gly 

15 10 15 

Arg Ser Pro Ala Pro Thr Ala Leu Trp Gly Ala Ser Pro Trp Leu Ser 

20 25 30 

Pro Leu Thr Ser Ala Leu Arg Gin Pro Leu His Arg Ala Pro Leu Leu 

35 40 45 

Pro Gly Gin Leu Cys Trp Ser Pro Arg Pro Leu Glu Lys Asn Lys Ala 

50 55 60 

Met Gly Arg Pro Leu Leu Leu Pro Leu Leu Leu Leu Leu Gin Pro Pro 
65 70 75 80 

Ala Phe Leu Gin Pro Gly Gly Ser Thr Gly Ser Gly Pro Ser Tyr Leu 

85 90 95 

Tyr Gly Val Thr Gin Pro Lys His Leu Ser Ala Ser Met Gly Gly Ser 

100 105 110 

Val Glu lie Pro Phe Ser Phe Tyr Tyr Pro Trp Glu Leu Ala lie Val 

115 120 125 

Pro Asn Val Arg lie Ser Trp Arg Arg Gly His Phe His Gly Gin Ser 

130 135 140 

Phe Tyr Ser Thr Arg Pro Pro Ser lie His Lys Asp Tyr Val Asn Arg 
145 150 155 160 

Leu Phe Leu Asn Trp Thr Glu Gly Gin Glu Ser Gly Phe Leu Arg lie 

165 170 175 

Ser Asn Leu Arg Lys Glu Asp Gin Ser Val Tyr Phe Cys Arg Val Glu 



180 



185 



190 
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Leu Asp Thr Arg Arg Ser Gly Arg Gin Gin Leu Gin Ser lie Lys Gly 

195 200 205 

Thr Lys Leu Thr lie Thr Gin Ala Val Thr Thr Thr Thr Thr Trp Arg 

210 215 220 

Pro Ser Ser Thr Thr Thr lie Ala Gly Leu Arg Val Thr Glu Ser Lys 
225 230 235 240 

Gly His Ser Glu Ser Trp His Leu Ser Leu Asp Thr Ala lie Arg Val 

245 250 255 

Ala Leu Ala Val Ala Val Leu Lys Thr Val He Leu Gly Leu Leu Cys 

260 265 270 

Leu Leu Leu Leu Trp Trp Arg Arg Arg Lys Gly Ser Arg Ala Pro Ser 

275 280 285 

Ser Asp Phe 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 293 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Mel: Thr V^^i 8er Gin Arg Phe Gin Leu Ser Asn Ser Gly Pro Asn Ser 

15 10 15 

Thr He Lys Met Lys He Ala Leu Arg Val Leu His Leu Glu Lys Arg 

20 25 30 

Glu Arg Pro Pro Asp His Gin His Ser Ala Gin Val Lys Arg Pro Ser 

35 40 45 

Val Ser Lys Glu Gly Arg Lys Thr Ser He Lys Ser His Met Ser Gly 

50 55 60 

Ser Pro Gly Pro Gly Gly Ser Asn Thr Ala Pro Ser Thr Pro Val He 



290 
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65 70 75 80 

Gly Gly Ser Asp Lys Pro Gly Met Glu Glu Lys Ala Gin Pro Pro Glu 

85 90 95 

Ala Gly Pro Gin Gly Leu His Asp Leu Gly Arg Ser Ser Ser Ser Leu 

100 105 110 

Leu Ala Ser Pro Gly His lie Ser Val Lys Glu Pro Thr Pro Ser He 

115 120 125 

Ala Ser Asp He Ser Leu Pro He Ala Thr Gin Glu Leu Arg Gin Arg 

130 135 140 

Leu Arg Gin Leu Glu Asn Gly Thr Thr Leu Gly Gin Ser Pro Leu Gly 
145 150 155 160 

Gin He Gin Leu Thr He Arg His Ser Ser Gin Arg Aen Lys Leu He 

165 170 175 

Val Val Val His Ala Cys Arg Asn Leu He Ala Phe Ser Glu Asp Gly 

180 185 190 

Ser Asp Pro Tyr Val Arg Met Tyr Leu Leu Pro Asp Lys Arg Arg Ser 

195 200 205 

Gly Arg Arg Lys Thr His Val Ser Lys Lys Thr Leu Asn Pro Val Phe 

210 215 220 

Asp Gin Ser Phe Asp Phe Ser Val Ser Leu Pro Glu Val Gin Arg Arg 
225 230 235 240 

Thr Leu Asp Val Ala Val Lys Asn Ser Gly Gly Phe Leu Ser Lys Asp 

24S * 250 - 255 ••■ 

Lys Gly Leu Leu Gly Lys Val Leu Val Ala Leu Ala Ser Glu Glu Leu 

260 265 270 

Ala Lys Gly Trp Thr Gin Trp Tyr Asp Leu Thr Glu Asp Gly Thr Arg 

275 280 285 

Pro Gin Ala Met Thr 
290 

(2) INFORMATION FOR SEQ ID NO: 23: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 206 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY s linear 
(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23s 

Met Glu Arg Arg His Pro Val Cys Ser 61y Thr Cys Gin Pro Thr Gin 

15 10 15 

Phe Arg Cys Ser Asn Gly Cys Cys lie Asp Ser Phe Leu Glu Cys Asp 

20 25 30 

Asp Thr Pro Asn Cys Pro Asp Ala Ser Asp Glu Ala Ala Cys Glu Lys 

35 40 45 

Tyr Thr Ser Gly Phe Asp Glu Leu Gin Arg lie His Phe Pro Ser Asp 

50 55 60 

Lys Gly His Cys Val Asp Leu Pro Asp Thr Gly Leu Cys Lys Glu Ser 
65 70 75 80 

lie Pro Arg Trp Tyr Tyr Asn Pro Phe Ser Glu His Cys Ala Arg Phe 

85 90 95 

Thr Tyr Gly Gly Cys Tyr Gly Asn Lys Asn Asn Phe Glu Glu Glu Gin 

100 105 110 

Gin Cys Leu Glu Ser Cys Arg Gly lie Ser Lys Lys Asp Val Phe Gly 

115 120 125 

Leu Arg Arg Glii lie Pro lie Pro Ser Thr Gly Ser Val Glu Met Ala 

130 135 140 

Val Ala Val Phe Leu Val lie Cys He Val Val Val Val Ala He Leu 
145 150 155 160 

Gly Tyr Cys Phe Phe Lys Asn Gin Arg Lys Asp Phe His Gly His His 

165 170 175 

His His Pro Pro Pro Thr Pro Ala Ser Ser Thr Val Ser Thr Thr Glu 

180 185 190 

Asp Thr Glu His Leu Val Tyr Asn His Thr Thr Arg Pro Leu 
195 200 205 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) USN6TH: 220 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Met Ala Gly Leu Ser Arg Gly Ser Ala Arg Ala Leu Leu Ala Ala Leu 

15 10 15 

Leu Ala Ser Thr Leu Leu Ala Leu Leu Val Ser Pro Ala Arg Gly Arg 

20 25 30 

Gly Gly Arg Asp His Gly Asp Trp Asp Glu Ala Ser Arg Leu Pro Pro 

35 40 45 

Leu Pro Pro Arg Glu Asp Ala Ala Arg Val Ala Arg Phe Val Thr His 

50 55 60 

Val Ser Asp Trp Gly Ala Leu Ala Thr He Ser Thr Leu Glu Ala Val 
65 70 75 80 

Arg Gly Arg Pro Phe Ala Asp Val Leu Ser Leu Ser Asp Gly Pro Pro 

85 90 95 

Gly Ala Gly Ser Gly Val Pro Tyr Phe Tyr Leu Ser Pro Leu Gin Leu 

100 105 110 

ser Val Ser Asn Leu Gin Glu Asn Pro Tyr Ala Thr Leu Thr Met Thr 

115 120 125 

Leu Ala Gin Thr Asn Phe Cys Lys Lys His Gly Phe Asp Pro Gin Ser 

130 135 140 

Pro Leu Cys Val His He Met Leu Ser Gly Thr Val Thr Lys Val Asn 
145 150 155 160 

Glu Thr Glu Met Asp He Ala Lys His Ser Leu Phe He Arg His Pro 

165 170 175 

Glu Met Lys Thr Trp Pro Ser Ser His Asn Trp Phe Phe Ala Lys Leu 

180 185 190 

Asn He Thr Asn He Trp Val Leu Asp Tyr Phe Gly Gly Pro Lys He 

195 200 205 

Val Thr Pro Glu Glu Tyr Tyr Asn Val Thr Val Gin 
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210 



215 



220 



(2) INFORUATION FOR SEQ ID NO: 25: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 197 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Met Asp His His Cys Pro Trp Leu Asn Asn Cys Val Gly His Tyr Asn 

15 10 15 

His Arg Tyr Phe Phe Ser Phe Cys Phe Phe Met Thr Leu Gly Cys Val 

20 25 30 

Tyr Cys Ser Tyr Gly Ser Trp Asp Leu Phe Arg Glu Ala Tyr Ala Ala 

35 40 45 

lie Glu Lys Met Lys Gin Leu Asp Lys Asn Lys Leu Gin Ala Val Ala 

50 55 60 

Asn Gin Thr Tyr His Gin ; Thr Pro Pro Pro. Thr. Phe Ser Phe Arg Glu , - 



Arg Met Thr His Lys Ser Leu Val Tyr Leu Trp Phe Leu Cys Ser Ser 

85 90 95 

Val Ala Leu Ala Leu Gly Ala Leu Thr Val Trp His Ala Val Leu lie 

100 105 110 

Ser Arg Gly Glu Thr Ser lie Glu Arg His lie Asn Lys Lys Glu Arg 

115 120 125 

Arg Arg Leu Gin Ala Lys Gly Arg Val Phe Arg Asn Pro Tyr Asn Tyr 

130 135 140 

Gly Cys Leu Asp Asn Trp Lys Val Phe Leu Gly Val Asp Thr Gly Arg 
"5 150 155 160 

His Trp Leu Thr Arg Val Leu Leu Pro Ser Thr His Leu Pro His Gly 



65 



70 



75 



80 



165 



170 



175 
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Asn Gly Met Ser Trp Olu Pro Pro Pro Trp Val Thr Ala His Ser Ala 

180 185 190 

ser Val Met Ala Val 



(2) INFORMATION FOR SEQ ID NO: 26: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 451 amino acids 

(B) TirPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY s linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Met Ala Pro Leu Gly Met Leu Leu Gly Leu Leu Met Ala Ala Cys Phe 

15 10 15 

Thr Phe Cys Leu Ser His Gin Asn Leu Lys Glu Phe Ala Leu Thr Asn 

20 25 30 

Pro Glu Lys Ser Ser Thr Lys Glu Thr Glu Arg Lys Glu Thr Lys Ala 

35 40 45 

Glu Glu Olu Leu Asp Ala Glu Val Leu Glu Val Phe His Pro Thr His 

50 55 60 

Olu Trp Gin Ala Leu Gin Pro Gly Gin Ala Val Pro Ala Gly Ser His 
65 70 75 80 

Val Arg Leu Asn Leu Gin Thr Gly Glu Arg Glu Ala Lys Leu Gin Tyr 

85 90 95 

Glu Asp Lys Phe Arg Asn Asn Leu Lys Gly Lys Arg Leu Asp lie Asn 

100 105 110 

Thr Asn Thr Tyr Thr Ser Gin Asp Leu Lys Ser Ala Leu Ala Lys Phe 

115 120 125 

Lys Glu Gly Ala Glu Met Glu Ser Ser Lys Glu Asp Lys Ala Arg Gin 

130 135 140 

Ala Glu Val Lys Arg Leu Phe Arg Pro lie Glu Glu Leu Lys Lys Asp 



195 
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145 150 155 160 

Phe Asp Glu Leu Asn Val Val lie Glu Thr Asp Met Gin lie Met Val 

165 170 175 

Arg Leu lie Asn Lys Phe Asn Ser Ser Ser Ser Ser Leu Glu Glu Lys 

180 185 190 

He Ala Ala Leu Phe Asp Leu Glu Tyr Tyr Val His Gin Met Asp Asn 

195 200 205 

Ala Gin Asp Leu Leu Ser Phe Gly Gly Leu Gin Val Val He Asn Gly 

210 215 220 

Leu Asn Ser Thr Glu Pro Leu Val Lys Glu Tyr Ala Ala Phe Val Leu 
225 230 235 240 

Gly Ala Ala Phe Ser Ser Asn Pro Lys Val Gin Val Glu Ala He Glu 

245 250 255 

Gly Gly Ala Leu Gin Lys Leu Leu Val He Leu Ala Thr Glu Gin Pro 

260 265 270 

Leu Thr Ala Lys Lys Lys Val Leu Phe Ala Leu Cys Ser Leu Leu Arg 

275 280 285 

His Phe Pro Tyr Ala Gin Arg Gin Phe Leu Lys Leu Gly Gly Leu Gin 

290 295 300 

Val Leu Arg Thr Leu Val Gin Glu Lys Gly Thr Glu Val Leu Ala Val 
305 310 315 320 

Arg Val Val Thr Leu Leu Tyr Asp Leu Val Thr Glu Lys Met Phe Ala 
' • ■■^325 *■ -^^•■330 --^ - -r'^-. "^--^ 

Glu Glu Glu Ala Glu Leu Thr Gin Glu Met Ser Pro Glu Lys Leu Gin 

340 345 350 

Gin Tyr Arg Gin Val His Leu Leu Pro Gly Leu Trp Glu Gin Gly Trp 

355 360 365 

Cys Glu He Thr Ala His Leu Leu Ala Leu Pro Glu His Asp Ala Arg 

370 375 380 

Glu Lys Val Leu Gin Thr Leu Gly Val Leu Leu Thr Thr Cys Arg Asp 
385 390 395 400 

Arg Tyr Arg Gin Asp Pro Gin Leu Gly Arg Thr Leu Ala Ser Leu Gin 

405 410 415 

Ala Glu Tyr Gin Val Leu Ala Ser Leu Glu Leu Gin Asp Gly Glu Asp 

420 425 430 

Glu Gly Tyr Phe Gin Glu Leu Leu Gly Ser Val Asn Ser Leu Leu Lys 
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435 440 445 

Glu Leu Arg 
450 

(2) INFORM21TION FOR SEQ ID NO:27s 

(1) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 254 amino acids 

(B) TYPEt amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Trp Gin Ala Gly Lys Arg Gin Ala Ser Arg Ala Phe Ser Leu Tyr 

15 10 15 

Ala Asn lie Asp lie Leu Arg Pro Tyr Phe Asp Val Glu Pro Ala Gin 

20 25 30 

Val Arg Ser Arg Leu Leu Glu Ser Met lie Pro lie Lys Met Val Asn 

35 40 45 

Phe Pro Gin Lys lie Ala Gly Glu Leu Tyr Gly Pro Leu Met Leu Val t..«,V 

50 55 60 

Phe Thr Leu Val Ala lie Leu Leu His Gly Met Lys Thr Ser Asp Thr 
65 70 75 80 

He He Arg Glu Gly Thr Leu Met Gly Thr Ala He Gly Thr Cys Phe 

85 90 95 

Gly Tyr Trp Leu Gly Val Ser Ser Phe He Tyr Phe Leu Ala Tyr Leu 

100 105 110 

Cys Asn Ala Gin He Thr Met Leu Gin Met Leu Ala Leu Leu Gly Tyr 

115 120 125 

Gly Leu Phe Gly His Cys He Val Leu Phe He Thr Tyr Asn He His 

130 135 140 

Leu His Ala Leu Phe Tyr Leu Phe Trp Leu Leu Val Gly Gly Leu Ser 
145 150 155 160 

57 



BNSDOCID: <W0 9825959A2 I > 



wo 98/25959 




PCTAJS97/22787 



Thr Leu Arg Met Val Ala Val Leu Val Ser Arg Thr Val Gly Pro Thr 

165 170 175 

Gin Arg Leu Leu Leu Cys Gly Thr Leu Ala Ala Leu His Met Leu Phe 

180 185 190 

Leu Leu Tyr Leu His Phe Ala Tyr His Lys Val Val Glu Gly He Leu 

195 200 205 

Asp Thr Leu Glu Gly Pro Asn lie Pro Pro He Gin Arg Val Pro Arg 

210 215 220 

Asp He Pro Ala Met Leu Pro Ala Ala Arg Leu Pro Thr Thr Val Leu 
225 230 235 240 

Asn Ala Thr Ala Lys Ala Val Ala Val Thr Leu Gin Ser His 
245 250 



(2) INFORMATION FOR SEQ ID NO: 28s 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



Met Gly Ser Glu Asn Glu Ala Leu 

1 5 
Trp Leu Lys Ala Gly Glu Val Ser 
20 

Ala Leu Asp Leu Ser Val Ala Ala 

35 40 
Glu Thr Leu Tyr Asp Ser Gly Ala 

50 55 
Val Met Glu Lys Leu Pro Ser Gly 
65 70 
Thr Ser His Glu Ala Pro Ala Met 



Asp Leu Ser Met Lys Ser Val Pro 

10 15 
Pro Pro He Phe Gin Glu Asp Ala 
25 30 
His Arg Lys Ser Glu Pro Pro Pro 
45 

Ser Val Asp Ser Ser Gly His Thr 
60 

Met Glu He Ser Phe Ala Pro Ala 

75 80 
Met Asp Ser His He Ser Ser Ser 
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90 ^= 
®^ His pro Ser Oly Olu 



105 



110 



. ..n lie Gl« Mel: Val Oly Glu ser Gin Ala Ala 
Val Lye Ala Glu Aan Asn He Glu 

120 

, Ala val pro Thr He Phe Cya Gly 

I.ys val He val Ser Val Glu Asp Ala Val 

135 

"° ... Glv val ser Thr Lys Aen Phe Ser Phe I.ys 

tys He Lye Gly Leu Ser Oly Val s 

n KO lbs 

Tie Asn ser Gin Gly Glu 

oiu «P s« V.1 oi» «v 

* T^a Pro lie Lys Asn Arg 

5.r M.t Oly ».» »!• 

Q*^ 185 

3. ». v.. «n se. O.- V. ^ - 

195 

x« "« "° 

210 

(2> INFORMATION PGR SEQ ID NO-.29t 

(i) SEQUENCE CHARACTERISTICS! 
(A) LENGTHS 266 amino acids 
•(B)- TYPE: amino ae id- 

(C) STRAHDEDNESSi single 

(D) TOPOLOGY t linear 

(ii) MOLECULE TYPEt None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
.e. val Lya Val Thr Phe Asn Ser Ala L^ Ala Gin Lys Glu Ala Lys 

1 ... Olu pro l:s ser Gly Glu Glu 1 Leu Il_e He Pro Asp 

25 

2° val val pro Val Gly 



Lys Asp Gi-u — - 30 

20 25 
Ala val Ala Val Asp Cys Lys Asp Pro Asp Asp 



*5 

35 *° 



59 



<WO 9825959A2 I > 



wo 98/25959 



PCT/US97/22787 



Gin Arg Arg Ala Trp Cys Trp Cys Met Cys Phe Gly Leu Ala Phe Met: 

50 55 60 

Leu Ala Gly Val lie Leu Gly Gly Ala Tyr Leu Tyr Lye Tyr Phe Ala 
65 70 75 80 

Leu Gin Pro Asp Asp Val Tyr Tyr Cys Gly lie Lys Tyr lie Lys Asp 

85 90 95 

Asp Val lie Leu Asn Glu Pro Ser Ala Asp Ala Pro Ala Ala Leu Tyr 

100 105 110 

Gin Thr lie Glu Glu Asn lie Lys lie Phe Glu Glu Glu Glu Val Glu 

115 120 125 

Phe lie Ser Val Pro Val Pro Glu Phe Ala Asp Ser Asp Pro Ala Asn 

130 135 140 

He Val His Asp Phe Asn Lys Lys Leu Thr Ala Tyr Leu Asp Leu Asn 
145 150 155 160 

Leu Asp Lys Cys Tyr Val Xle Pro Leu Asn Thr Ser He Val Met Pro 

165 170 175 

Pro Arg Asn Leu Leu Glu Leu Leu He Asn He Lys Ala Gly Thr Tyr 

180 185 190 

Leu Pro Gin Ser Tyr Leu He His Glu His Met Val He Thr Asp Arg 

195 200 205 

He Glu Asn He Asp His Leu Gly Phe Phe He Tyr Arg Leu Cys His 

210 215 220 

Asp Lys Glu Thr Tyr Lys ^Leu Gin Arg Arg Glu ^Thr lie Lys Gly.. lie. 
225 230 235 240 

Gin Lys Arg Glu Ala Ser Asn Cys Phe Ala He Arg His Phe Glu Asn 

245 250 255 

Lys Phe Ala Val Glu Thr Leu He Cys Ser 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 251 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



260 
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(11) MOLECULE TYPE: None 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Met Pro Thr Gly Asp Phe Asp Ser Lya Pro Ser Trp Ala Asp Gin Val 

15 10 15 

Glu Glu Glu Gly Glu Asp Asp Lys Cys Val Thr Ser Glu Leu Leu Lys 

20 25 30 

Gly lie Pro Leu Ala Thr Gly Asp Thr Ser Pro Glu Pro Glu Leu Leu 

35 40 45 

Pro Gly Ala Pro Leu Pro Pro Pro Lys Glu Val lie Asn Gly Asn lie 

50 55 60 

Lys Thr Val Thr Glu Tyr Lys lie Asp Glu Asp Gly Lys Lys Phe Lys 
65 70 75 80 

lie Val Arg Thr Phe Arg lie Glu Thr Arg Lys Ala Ser Lys Ala Val 

85 90 95 

Ala Arg Arg Lys Asn Trp Lys Lys Phe Gly Asn Ser Glu Phe Asp Pro 

100 105 110 

Pro Gly Pro Asn Val Ala Thr Thr Thr Val Ser Asp Asp Val Ser Met 

115 120 125 

Thr Phe lie Thr Ser Lys Glu Asp Leu Asn Cys Gin Glu Glu Glu Asp 

130 135 140 

Pro Met Asn Lys Phe Lys Gly Gin Lys lie Val Ser Cys^Arg Zle Cys 
145 150 155 160 

Lys Gly Asp His Trp Thr Thr Arg Cys Pro Tyr Lys Asp Thr Leu Gly 

165 170 175 

Pro Met Gin Lys Glu Leu Ala Glu Gin Leu Gly Leu Ser Thr Gly Glu 

180 185 190 

Lya Glu Lys Leu Pro Gly Glu Leu Glu Pro Val Gin Ala Thr Gin Asn 

195 200 205 

Lys Thr Gly Lys Tyr Val Pro Pro Ser Leu Arg Asp Gly Ala Ser Arg 

210 215 220 

Arg Gly Glu Ser Met Gin Pro Asn Arg Arg Ala Asp Asp Asn Ala Thr 
225 230 235 240 

lie Arg Val Thr Asn Leu Arg Arg Gly His Ala 
245 250 
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(2) INFORMATION FOR S£Q ID NOs31; 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 377 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: None 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Arg Arg Leu Asn Arg Lys Lys Thr Leu Ser Leu Val Lye Glu I«eu 

15 10 15 

Asp Ala Phe Pro Lys Val Pro Glu Ser Tyr Val Glu Thr Ser Ala Ser 

20 25 30 

Gly Gly Thr Val Ser Leu lie Ala Phe Thr Thr Met Ala Leu Leu Thr 

35 40 45 

lie Met Glu Phe Ser Val Tyr Gin Asp Thr Trp Met Lys Tyr Glu Tyr 

50 55 60 

Glu Val Asp Lys Asp Phe Ser Ser Lys Leu Arg lie Asn lie Asp He 
65 70 75 80 

Thr Val Ala Met Lys Cys Gin Tyr Val Gly Ala Asp. Val Leu Asp^ Leu ^ \:. . - - , 



Ala Glu Thr Met Val Ala Ser Ala Asp Gly Leu Val Tyr Glu Pro Thr 

100 105 110 

Val Phe Asp Leu Ser Pro Gin Gin Lys Glu Trp Gin Arg Met Leu Gin 

115 120 125 

Leu He Gin Ser Arg Leu Gin Glu Glu His Ser Leu Gin Asp Val He 

130 135 140 

Phe Lys Ser Ala Phe Lys Ser Thr Ser Thr Ala Leu Pro Pro Arg Glu 
145 150 155 160 

Asp Asp Ser Ser Gin Ser Pro Asn Ala Cys Arg He His Gly His Leu 

165 170 175 

Tyr Val Asn Lys VaX Ala Gly Asn Phe His He Thr Val Gly Lys Ala 



85 



90 



95 



180 



185 



190 
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lie Pro His Pro Arg Gly His Ala His Leu Ala Ala Leu Val Asn His 

195 200 205 

Glu Ser Tyr Asn Phe Ser His Arg lie Asp His Leu Ser Phe Gly Glu 

210 215 220 

Leu Val Pro Ala He He Asn Pro Leu Asp Gly Thr Glu Lys He Ala 
225 230 235 240 

He Asp His Asn Gin Met Phe Gin Tyr Phe He Thr Val Val Pro Thr 

245 250 255 

Lys Leu His Thr Tyr Lys He Ser Ala Asp Thr His Gin Phe Ser Val 

260 265 270 

Thr Glu Arg Glu Arg He He Asn His Ala Ala Gly Ser His Gly Val 

275 280 285 

Ser Gly He Phe Met Lys Tyr Asp Leu Ser Ser Leu Met Val Thr Val 

290 295 300 

Thr Glu Glu His Met Pro Phe Trp Gin Phe Phe Val Arg Leu Cys Gly 
305 310 315 320 

He Val Gly Gly He Phe Ser Thr Thr Gly Met Leu His Gly He Gly 

325 330 335 

Lys Phe He Val Glu He He Cys Cys Arg Phe Arg Leu Gly Ser Tyr 

340 345 350 

Lys Pro Val Asn Ser Val Pro Phe Glu Asp Gly His Thr Asp Asn His 

355 360 365 

Leu Pro Leu .Leu vGlu 4isn Asn ..Thr., His. 
370 375 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
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Mel: Gly Ser Gin His Ser Ala Ala Ala Arg Pro Ser Ser Cys Arg Arg 

15 10 15 

Lya Gin Glu Asp Asp Arg Asp Gly Leu Leu Ala Glu Arg Glu Gin Glu 

20 25 30 

Glu Ala He Ala Gin Phe Pro Tyr Val Glu Phe Thr Gly Arg Asp Ser 

35 40 45 

He Thr Cys Leu Thr Cys Gin Gly Thr Gly Tyr He Pro Thr Glu Gin 

50 55 60 

Val Asn Glu Leu Val Ala Leu He Pro His Ser Asp Gin Arg Leu Arg 
65 70 75 80 

Pro Gin Arg Thr Lys Gin Tyr Val Leu Leu Ser He Leu Leu Cys Leu 

85 90 95 

Leu Ala Ser Gly Leu Val Val Phe Phe Leu Phe Pro His Ser Val Leu 

100 105 110 

Val Asp Asp Asp Gly He Lys Val Val Lys Val Thr Phe Asn Lys Gin 

115 120 125 

Asp Ser Leu Val He Leu Thr He Met Ala Thr Leu Lys He Arg Asn 

130 135 140 

Ser Asn Phe Tyr Thr Val Ala Val Thr Ser Leu Ser Ser Gin He Gin 
145 150 155 160 

Tyr Met: Asn Thr Val Val Ser Thr Tyr Val Thr Thr Asn Val Ser Leu 

165 170 175 

lie Prdi'Pro Arg Ser. Glu Gin Leu Val-Asn Phe Thr -Gly-Lys *Ala'^Glu 

180 185 190 

Met Gly Gly Pro Phe Ser Tyr Val Tyr Phe Phe Cys Thr Val Pro Glu 

195 200 205 

He Leu Val His Asn He Val He Phe Met Arg Thr Ser Val Lys He 

210 215 220 

Ser Tyr He Gly Leu Met Thr Gin Ser Ser Leu Glu Thr His His Tyr 
225 230 235 240 

Val Asp Cys Gly Gly Asn Ser Thr Ala He 
245 250 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 



(C) STRT^DEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met: Val Thr Cys Phe His Val Pro Tyr Ser Ala Leu Thr Met Phe lie 

15 10 15 

Ser Thr Glu Gin Thr Glu Arg Asp Ser Ala Thr Ala Tyr Arg Met Thr 

20 25 30 

Val Glu Val Leu Gly Thr Val Leu Gly Thr Ala lie Gin Gly Gin lie 

35 40 45 

Val Gly Gin Ala Asp Thr Pro Cys Phe Gin Asp Leu Asn Ser Ser Thr 

50 55 60 

Val Ala Ser Gin Ser Ala Asn His Thr His Gly Thr Thr Ser His Arg 
65 70 75 80 

Glu Thr Gin Lys Ala Tyr Leu Leu Ala Ala Gly Val lie Val Cys lie 

85 90 95 

Tyr lie He Cys Ala Val He Leu He Leu Gly Val Arg Glu Gin Arg 

100 105 110 

Glu Pro Tyr Glu Ala Gin Gin Ser Glu Pro He Ala Tyr Phe Arg Gly 

115 120 125 

Leu Arg Leu Val Met Ser His Gly Pro Tyr He Lys Leu He Thr Gly 

130 135 140 

Phe Leu Phe Thr Ser Leu Ala Phe Met Leu Val Glu Gly Asn Phe Val 
145 150 155 160 

Leu Phe Cys Thr Tyr Thr Leu Gly Phe Arg Asn Glu Phe Gin Asn Leu 

165 170 175 

Leu Leu Ala He Met Leu Ser Ala Thr Leu Thr He Pro He Trp Gin 

180 185 190 

Trp Phe Leu Thr Arg Phe Gly Lys Lys Thr Ala Val Tyr Val Gly He 

195 200 205 

Ser Ser Ala Val Pro Phe Leu He Leu Val Ala Leu Met Glu Ser Asn 
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210 



215 



220 



Leu lie lie Thr Tyr 



Ala Val Ala 



Val Ala Ala Gly lie Ser Val Ala 



225 



230 



235 



240 



Ala Ala Phe Leu Leu Pro Trp Ser Met Leu Pro Asp Val lie Asp Asp 

245 250 255 

Phe His Leu Lys Gin Pro His Phe His Gly Thr Glu Pro lie Phe Phe 

260 265 270 

Ser Phe Tyr Val Phe Phe Thr Lys Phe Ala Ser Gly Val Ser Leu Gly 

275 280 285 

lie Ser Thr Leu Ser Leu Asp Phe Ala Gly Tyr Gin Thr Arg Gly Cys 

290 295 300 

Ser Gin Pro Glu Arg Val Lys Phe Thr Leu Asn Met Leu Val Thr Met 
305 310 315 320 

Ala Pro lie Val Leu lie Leu Leu Gly Leu Leu Leu Phe Lys Met Tyr 

325 330 335 

Pro lie Asp Glu Glu Arg Arg Arg Gin Asn Lys Lys Ala Leu Gin Ala 

340 345 350 

Leu Arg Asp Glu Ala Ser Ser Ser Gly Cys Ser Glu Thr Asp Ser Thr 

355 360 365 

Glu Leu Ala Ser lie Leu 
370 

(2) INFORMATION FOR SEQ lO NO: 34: ^ . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 334 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:. 

Met Val Asn Asp Pro Pro Val Pro Ala Leu Leu Trp Ala Gin Glu Val 
15 10 15 
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Gly Gin Val Leu Ala Gly Arg Ala Arg Arg Leu Leu Leu Gin Phe Gly 

20 25 30 

Val Leu Phe Cye Thr He Leu Leu Leu Leu Trp Val Ser Val Phe Leu 

35 40 45 

Tyr Gly Ser Phe Tyr Tyr Ser Tyr Met Pro Thr Val Ser His Leu Ser 

50 55 60 

Pro Val Hie Phe Tyr Tyr Arg Thr Asp Cys Asp Ser Ser Thr Thr Ser 
65 70 75 80 

Leu Cys Ser Phe Pro Val Ala Asn Val Ser Leu Thr Lys Gly Gly Arg 

85 90 95 

Asp Arg Val Leu Met Tyr Gly Gin Pro Tyr Arg Val Thr Leu Glu Leu 

100 105 110 

Glu Leu Pro Glu Ser Pro Val Asn Gin Asp Leu Gly Met Phe Leu Val 

115 120 125 

Thr He Ser Cys Tyr Thr Arg Gly Gly Arg lie He Ser Thr Ser Ser 

130 135 140 

Arg Ser Val Met Leu His Tyr Arg Ser Asp Leu Leu Gin Met Leu Asp 
145 150 155 160 

Thr Leu Val Phe Ser Ser Leu Leu Leu Phe Gly Phe Ala Glu Gin Lys 

165 170 175 

Gin Leu Leu Glu Val Glu Leu Tyr Ala Asp Tyr Arg Glu Asn Ser Tyr 

180 185 190 

Val Pro Thr Thr Gly Ala He He Glu lie His Ser Lys Arg He Gin 

195 200 205 

Leu Tyr Gly Ala Tyr Leu Arg He His Ala His Phe Thr Gly Leu Arg 

210 215 220 

Tyr Leu Leu Tyr Asn Phe Pro Met Thr Cys Ala Phe He Gly Val Ala 
225 230 235 240 

Ser Asn Phe Thr Phe Leu Ser Val He Val Leu Phe Ser Tyr Met Gin 

245 250 255 

Trp Val Trp Gly Gly He Trp Pro Arg His Arg Phe Ser Leu Gin Val 



Asn He Arg Lys Arg Asp Asn Ser Arg Lys Glu Val Gin Arg Arg He 

275 280 285 

Ser Ala His Gin Pro Gly Pro Glu Gly Gin Glu Glu Ser Thr Pro Gin 



260 



265 



270 



290 



295 



300 



67 



BNSDOCID: <W0 9825959A2 1 > 



wo 98/25959 



PCT/US97/22787 



Ser Asp Val Thr Glu Asp Gly Glu Ser Pro Glu Asp Pro Ser Gly Thr 
305 310 315 320 

Glu Val Ser Cys Pro Arg Arg Arg Asn Gin He Ser Ser Pro 



(2) INFOHMATION FOR SEQ ID NOs35s 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 276 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Thr His Pro Gly Thr Gly Asp He He Ala Val Met He Thr Glu 

15 10 15 

Leu Arg Gly Lys Asp He Leu Ser Tyr Leu Glu Lys Asn He Ser Val 

20 25 30 

Gin Met Thr He Ala Val Gly Thr Arg Met Pro Pro Lys Asn Phe Ser 

— -^'-35- -' vrv: V.-. . .40 •:■ V... ^. :....4-5, ^— . -.r*- 

Arg Gly Ser Leu Val Phe Val Ser He Ser Phe He Val Leu Met He 

50 55 60 

He Ser Ser Ala Trp Leu He Phe Tyr Phe He Gin Lys He Arg Tyr 
65 70 75 80 

Thr Asn Ala Arg Asp Arg Asn Gin Arg Arg Leu Gly Asp Ala Ala Lys 

85 90 95 

Lys Ala He Ser Lys Leu Thr Thr Arg Thr Val Lys Lys Gly Asp Lys 

100 105 110 

Glu Thr Asp Pro Asp Phe Asp His Cys Ala Val Cys He Glu Ser Tyr 

115 120 125 

Lys Gin Asn Asp Val Val Arg He Leu Pro Cys Lys His Val Phe His 

130 135 140 

Lys Ser Cys Val Asp Pro Trp Leu Ser Glu His Cys Thr Cys Pro Met 



325 
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145 150 155 160 

Cya Lys Leu Asn lie Leu Lys Ala Leu Gly lie Val Pro Asn Leu Pro 

165 170 175 

Cys Thr Asp Asn Val Ala Phe Asp Met Glu Arg Leu Thr Arg Thr Oln 

180 185 190 

Ala Val Asn Arg Arg Ser Ala Leu Gly Asp Leu Ala Gly Asp Asn Ser 

195 200 205 

Leu Gly Leu Glu Pro Leu Arg Thr Ser Gly lie Ser Pro Leu Pro Gin 

210 215 220 

Asp Gly Glu Leu Thr Pro Arg Thr Gly Glu lie Asn He Ala Val Thr 
225 230 235 240 

Lys Glu Trp Phe He He Ala Ser Phe Gly Leu Leu Ser Ala Leu Thr 

245 250 255 

Leu Cys Tyr Met He He Arg Ala Thr Ala Ser Leu Asn Ala Asn Glu 

260 265 270 

Val Glu Trp Phe 
275 

(2) INFORMATION FOR SEQ ID NO: 36: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECtJLE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

Met Ala Asn Ser Gly Leu Gin Leu Leu Gly Phe Ser Met Ala Leu Leu 

15 10 15 

Gly Trp Val Gly Leu Val Ala Cys Thr Ala He Pro Gin Trp Gin Met 

20 25 30 

Ser Ser Tyr Ala Gly Asp Asn He He Thr Ala Gin Ala Met Tyr Lys 
35 40 45 
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Gly Leu Trp Met Asp Cys Val Thr Gin Ser Thr Gly Met Met Ser Cya 

50 55 60 

I*y8 Met Tyr Asp Ser Val Leu Ala Leu Ser Ala Ala Leu Gin Ala Thr 
65 70 75 80 

Arg Ala Leu Met Val Val Ser Leu Val Leu Gly Phe Leu Ala Met Phe 

85 90 95 

Val Ala Thr Met Gly Met Lye Cys Thr Arg Cys Gly Gly Asp Asp Lys 

100 105 110 

Val Lys Lys Ala Arg lie Ala Met Gly Gly Gly lie lie Phe lie Val 

115 120 125 

Ala Gly Leu Ala Ala Leu Val Ala Cya Ser Trp Tyr Gly His Gin He 

130 135 140 

Val Thr Aap Phe Tyr Asn Pro Leu He Pro Thr Aan He Lys Tyr Glu 
145 150 155 160 

Phe Gly Pro Ala He Phe He Gly Trp Ala Gly Ser Ala Leu Val He 

165 170 175 

Leu Gly Gly Ala Leu Leu Ser Cys Ser Cys Pro Gly Asn Glu Ser Lys 

180 185 190 

Ala Gly Tyr Arg Ala Pro Arg Ser Tyr Pro Lys Ser Asn Ser Ser Lys 
195 200 205 

Glu Tyr 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 476 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: aingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
Met He Arg Pro Gin Leu Arg Thr Ala Gly Leu Gly Arg Cys Leu Leu 
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15 10 15 

Pro Oly Leu Leu Leu Leu Leu Val Pro Val Leu Trp Ala Gly Ala Glu 

20 25 30 

Lys Leu His Thr Gin Pro Ser Cys Pro Ala Val Cys Gin Pro Thr Arg 

35 40 45 

Cys Pro Ala Leu Pro Thr Cys Ala Leu Gly Thr Thr Pro Val Phe Asp 

50 55 60 

Leu Cys Arg Cys Cys Arg Val Cys Pro Ala Ala Glu Arg Olu Val Cys 
65 70 75 80 

Oly Gly Ala Gin Gly Gin Pro Cys Ala Pro Gly Leu Gin Cys Leu Gin 

85 90 95 

Pro Leu Arg Pro Oly Phe Pro Ser Thr Cys Oly Cys Pro Thr Leu Gly 

100 105 110 

Gly Ala Val Cys Gly Ser Asp Arg Arg Thr Tyr Pro Ser Met Cys Ala 

115 120 125 

Leu Arg Ala Olu Asn Arg Ala Ala Arg Arg Leu Oly Lye Val Pro Ala 

130 135 140 

Val Pro Val Gin Trp Gly Asn Cys Gly Asp Thr Gly Thr Arg Ser Ala 
145 150 155 160 

Oly Pro Leu Arg Arg Asn Tyr Asn Phe He Ala Ala Val Val Glu Lys 

165 170 175 

Val Ala Pro Ser Val Val His Val Gin Leu Trp Gly Arg Leu Leu His 

180 185 190 

Oly Ser Arg Leu Val Pro Val Tyr Ser Oly Ser Gly Phe He Val Ser 

195 200 205 

Glu Asp Gly Leu He He Thr Asn Ala His Val Val TUrg Asn Gin Gin 

210 215 220 

Trp He Glu Val Val Leu Gin Asn Gly Ala Arg Tyr Glu Ala Val Val 
225 230 235 240 

Lys Asp He Asp Leu Lys Leu Asp Leu Ala Val He Lys He Glu Ser 

245 250 255 

Asn Ala Glu Leu Pro Val Leu Met Leu Gly Arg Ser Ser Asp Leu Arg 

260 265 270 

Ala Gly Glu Phe Val Val Ala Leu Gly Ser Pro Phe Ser Leu Gin Asn 

275 280 285 

Thr Ala Thr Ala Gly He Val Ser Thr Lys Gin Arg Gly Gly Lys Glu 
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290 



295 



300 



Leu Gly Met Lys Asp Ser Asp Met Asp Tyr Val Gin lie Asp Ala Thr 
305 310 315 320 

lie Asn Tyr Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Asp 

325 330 335 

Val He Gly Val Asn Ser Leu Arg Val Thr Asp Gly He Ser Phe Ala 

340 345 350 

He Pro Ser Asp Arg Val Arg Gin Phe Leu Ala Glu Tyr His Glu His 

355 360 365 

Gin Met Lys Gly Lys Ala Phe Ser Asn Lys Lys Tyr Leu Gly Leu Gin 

370 375 380 

Met Leu Ser Leu Thr Val Pro Leu Ser Glu Glu Leu Lys Met His Tyr 
385 390 395 400 

Pro Asp Phe Pro Asp Val Ser Ser Gly Val Tyr Val Cys Lys Val Val 

405 410 415 

Glu Gly Thr Ala Ala Gin Ser Ser Gly Leu Arg Asp His Asp Val He 

420 425 430 

Val Asn He Asn Gly Lys Pro He Thr Thr Thr Thr Asp Val Val Lys 

435 440 445 

Ala Leu Asp Ser Asp Ser Leu Ser Met Ala Val Leu Arg Gly Lys Asp 

450 455 460 

Asn Leu Leu Leu Thr Val He Pro Glu Thr He Asn 
465 ■ 470-.-V, /^s-v-., •-..-..^475 • ; .'"r - ■ •'^ 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 266 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
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Met Val Lys Val Thr Phe Asn Ser Ala Leu Ala Gin Lys Glu Ala Lys 

1 5 10 15 

Lys Asp Glu Pro Glu Ser Gly Glu Glu Ala Leu lie lie Pro Pro Asp 

20 25 30 

Ala Val Ala Val Asp Cys Lys Asp Pro Asp Asp Val Val Pro Val Gly 

35 40 45 

Gin Arg Arg Ala Trp Cys Trp Cys Met Cys Phe Gly Leu Ala Phe Met 

50 55 60 

Leu Ala Gly Val lie Leu Gly Gly Ala Tyr Leu Tyr Lys Tyr Phe Ala 
65 70 75 80 

Leu Gin Pro Asp Asp Val Tyr Tyr Cys Gly lie Lys Tyr lie Lys Asp 

85 90 95 

Asp Val lie Leu Asn Glu Pro Ser Ala Asp Ala Pro Ala Ala Leu Tyr 

100 105 110 

Gin Thr lie Glu Glu Asn lie Lys lie Phe Glu Glu Glu Glu Val Glu 

115 120 125 

Phe lie Ser Val Pro Val Pro Glu Phe Ala Asp Ser Asp Pro Ala Asn 

130 135 140 

lie Val His Asp Phe Asn Lys Lys Leu Thr Ala Tyr Leu Asp Leu Asn 
145 150 155 160 

Leu Asp Lys Cys Tyr Val He Pro Leu Asn Thr Ser He Val Met Pro 

165 170 175 

Pro Arg Asn Leu Leu Glu Leu Leu He Asn He Lys Ala Gly Thr Tyr 

180 185 190 

Leu Pro Gin Ser Tyr Leu He His Glu His Met Val He Thr Asp Arg 

195 200 205 

He Glu Asn He Asp His Leu Gly Phe Phe He Tyr Arg Leu Cys His 

210 215 220 

Asp Lys Glu Thr Tyr Lys Leu Gin Arg Arg Glu Thr He Lys Gly He 
225 230 235 240 

Gin Lys Arg Glu Ala Ser Asn Cys Phe Ala He Arg His Phe Glu Asn 

245 250 255 

Lys Phe Ala Val Glu Thr Leu He Cys Ser 
260 265 
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We Clflim! 

1. An isolated and purified human protein having an amino acid 
sequence selected fi-om the group consisting of the amino acid sequences shown in 
SEQ IDNos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
and 38. 

2. An isolated and purified human protein having an amino acid 
sequence which is at least 85% identical to an amino acid sequence selected fi-om 
the group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 

23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

3. An isolated and purified human polypeptide comprising at least 6 
contiguous amino acids of an amino acid sequence selected firom the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

4. A fiision protein comprising a first protein segment and a second 
protein segment fused together by means of a peptide bond, wherein the first 
protein segment consists of at least 6 contiguous amino adds selected fi^om the 
group consisting of the amino acid sequences shoAvn in SEQ ID Nos:20, 21, 22, 23, 

24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

5 . A preparation of antibodies which specifically bind to the human 
protein of claim 1. 

6. An isolated and purified subgenomic polynucleotide having a 
nucleotide sequence selected firom the group consisting of the nucleotide sequences 
shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
and 19. 

7. An isolated gene corresponding to a cDNA sequence selected fi"om 
the group consisting of the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

8. A DNA construct for expressing all or a portion of a human protein 
having an amino acid sequence selected firom the group consisting of the amino acid 
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sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38, comprising: 
a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino adds 
of the human protein, wherein the polynucleotide segment is located downstream 
from the promoter, wherein transcription of the polynucleotide segment initiates at 
or 3' to the promoter. 

9. A host cell comprising a DNA construct comprising: 
a promoter, and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of a human protein having an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID NOs:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the promoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter. 

10. A homologously recombinant cell having incorporated therein a new 
transcription initiation unit, wherein the new transcription initiation unit comprises 
in 5* to 3' order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous ^on; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene. 

11. A method of producing a human protein, comprising the steps of: 
growing a culture of a cell comprising a DNA construct comprising 

(1) a promoter and (2) a polynucleotide segment encoding at least 6 contiguous 
amino acids of a human protein having an amino acid sequence selected from the 
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group consisting of the amino acid sequences shown in SEQ ID NOs:20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the promoter and wherein 
transcription of the polynucleotide segment initiates at or 3* to the promoter; and 
puri^ng the protein from the culture. 

12. A method of producing a human protein, comprising the steps of: 
growing a culture of a homologously recombinant cell having 

incorporated therein a new transcription imtiation unit, wherein the new 
transcription initiation unit comprises in 5' to 3* order: 

(a) an exogenous regulatory sequence; ' 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription imtiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene; and 
purifying the protein from the culture. 

13 . A method of identifying a secreted polypeptide which is modified by 
rough microsomes, comprising the steps of: 

transcribing in vitro a population of cDNA molecules whereby a 
population of cRNA molecules is formed; 

translating a first portion of the population of cElNA molecules in 
vitro in the absence of rough nfiicrosomes whereby a first population of polypeptides 
is formed; 

translating a second portion of the population of cKNA molecules in vitro in 
the presence of rough microsomes whereby a second population of polypeptides is 
formed; 

comparing the first population of polypeptides with the second 
population of polypeptides; and 
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detecting polypeptide members of the second population which have 
been modified by the rough microsomes. 
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1 I I Claims Nos.: 

because they relate to subject matter not required to be searched by this Authonty, namely: 



^' ^ bTC^le they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be earned out, specifically: 



^' ^ SSI!!ise mey are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention is iacicing (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this intemaHonal application, as follows: 

see annex 



□ As all required additional search fees were timely paid by the applicant, this Intemkion^ Search Report covers all 
searchable claims. 



2. rn As ail searchable claims could be searched without effort justifying an additional fee. this Authority did not invite payment 
of any additional fee. 



3 1 1 As only some of the required additional seareh fees were timely paid by the applicant, this International Search Report 

I 1 covers only those claims for which fees were paid, speciricaliy claims Nos.: 



4. I X I No required additional search fees were timely paid by the applicant. C 
restricted to the invention first mentioned in the claims; it is covered by 

Claims 1-12 (partially) (Extra sheet-1) 



Consequently, this International Search Report is 
claims Nos.: 



Remaric on Protest 



I I The additional search fees were accompanied by the applicant's protest. 
I [ No protest accompanied the payment of additional search fees. 



Form PCT/ISA/21 0 (continuation of first sheet (1 )) (July 1 992) 

BNSDOCID: <W0 9825959A3 I > 
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FURTHER INFORMATION CONTINUED FROM PCT/I8A/ 



1. Claims: 1-12 partially 

An isolated human protein having an amino acid sequence 
according to SEQ ID No. 20, homologs and fragments thereof, 
fusion proteins therewith, antibodies thereto. 
An isolated and purified polynucleotide having the sequence 
according to SEQ ID No. 1, the corresponding gene, DNA 
constructs, host cells, and homologously recombinant cells 
comprising said polynucleotide. 

A method of producing said protein using said DNA sequences, 
constructs, or cells. 

2. Claims: 1-12 partially 

idem for SEQ 10 21, 2 

3. Claims: 1-12 partially 

idem for SEQ ID No. 22. 3 

■ 4. Claims: 1-12 partially 

idem for SEQ ID No. 23, 4 

5. Claims: 1-12 partially 

idem for SEQ ID No. 24. 5 

6. Claims: 1-12 partially 

idem for SEQ ID No. 25. 6 

7. Claims: 1-12 partially 

idem for SEQ ID No. 26, 7 

8. Claims: 1-12 partially 

idem for SEQ ID No. 27, 8 

9. Claims: 1-12 partially 

idem for SEQ ID No. 28, 9 
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FURTHER INFORMATION CONTINUE D FROM PCT/ISA/ ^ 

10. Claims: 1-12 partially 

idem for SEQ ID No. 29. 10 

11. Claims: 1-12 partially 

idem for SEQ ID No. 30, 11 

12. Claims: 1-12 partially 

idem for SEQ ID No. 31. 12 

13. Claims: 1-12 partially 

idem for SEQ ID No. 32, 13 

14. Claims: 1-12 partially 

idem for SEQ ID No. 33, 14 

15. Claims: 1-12 partially 

idem for SEQ ID No. 34, 15 

16. Claims: 1-12 partially 

idem for SEQ ID No. 35, 16 

17. Claims: 1-12 partially 

idem for SEQ ID No. 36, 17 

18. Claims: 1-12 partially 

idem for SEQ ID No. 37, 18 

19. Claims: 1-12 partially 

idem for SEQ ID No. 38, 19 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 



20. Claim : 13 



A method of identifying a secreted polypeptide which is 
modified by rough microsomes, comprising: 
in vitro transcription of a cDMA population, translation of 
a first portion in the absence of rough microsomes in vitro, 
translation of a second portion in the presence of rough 
microsomes in vitro, comparison of the polypeptides of the 
first and second portion, and detection of members of the 
second portion that have been modified by the rough 
mi crosomes. 
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